File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes How to read the content of HTML table Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of REST with Spring (video course) this week in the Spring forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to read the content of HTML table" Watch "How to read the content of HTML table" New topic

How to read the content of HTML table

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235
Hi All,

I would like to retrieve the content of the following HTML table and is wondering whether there are any libraries/jars that could do the job easily without having to write a parser possibly in XSLT:

I am sure that there must be an easier way to do this such as using
existing SAX/DOM/XSLT.. jars to retrieve these values quickly.

I have slowly weighing through 2 books - Learning XML by Erik T. Ray ( and Java & XML by Brett D. McLaughlin & Justin Edelson ( but would like to dwell into the relevant chapter and by pass anything else that is not relevant to my current in the XML area in order to fast track development.

I am new to XML and would very much appreciate if you could point to the area of specific to focus on getting this job done only.

Many thanks,

William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13027
If this was my problem I would use the JTidy toolkit. It can parse most HTML into a DOM that you can pull data from. It is pretty good at coping with less than perfect HTML.

Jack Bush
Ranch Hand

Joined: Oct 20, 2006
Posts: 235

I will do more research to find the simplest method to use instead. Nevertheless, thanks for you input anyhow.

Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42959
I might use jWebUnit for making sense of HTML. It puts a nice API on top of the page that's easier to use than dealing with XML. Don't be put off that it's billed as a testing tool - using it to access HTML pages works just fine. Actually, I think it may use JTidy underneath as well.
Consider Paul's rocket mass heater.
subject: How to read the content of HTML table
It's not a secret anymore!