Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How would be the best way to parse HTML Content ?

 
Kiran Shirali
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Everyone,

I need to parse three or four HTML pages to extract data from them.

An example of the pages is:


In this case what I can be doing is:



Then by reading each line I can check the string for the classes 'value' and 'symbol'.

What I want to know is that whether there is a more efficient way to do this? Tomorrow it may happen that the class names can change. So I don't want my application to be tightly coupled to the HTML page.

Anybody has any suggestions?
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HtmlUnit
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic