Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to read the content of HTML table

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I would like to retrieve the content of the following HTML table and is wondering whether there are any libraries/jars that could do the job easily without having to write a parser possibly in XSLT:


I am sure that there must be an easier way to do this such as using
existing SAX/DOM/XSLT.. jars to retrieve these values quickly.

I have slowly weighing through 2 books - Learning XML by Erik T. Ray (http://www.oreilly.com/catalog/learnxml2/toc.html) and Java & XML by Brett D. McLaughlin & Justin Edelson (http://www.oreilly.com/catalog/9780596101497/toc.html) but would like to dwell into the relevant chapter and by pass anything else that is not relevant to my current in the XML area in order to fast track development.

I am new to XML and would very much appreciate if you could point to the area of specific to focus on getting this job done only.

Many thanks,

Jack
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13055
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If this was my problem I would use the JTidy toolkit. It can parse most HTML into a DOM that you can pull data from. It is pretty good at coping with less than perfect HTML.

Bill
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I will do more research to find the simplest method to use instead. Nevertheless, thanks for you input anyhow.

Jack
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I might use jWebUnit for making sense of HTML. It puts a nice API on top of the page that's easier to use than dealing with XML. Don't be put off that it's billed as a testing tool - using it to access HTML pages works just fine. Actually, I think it may use JTidy underneath as well.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic