| Author |
Light Weight HTML parsing
|
Anand Athinarayanan
Greenhorn
Joined: May 20, 2011
Posts: 27
|
|
Hi,
I will be writing a java program which is going to grab the contents of a html webpage and check for a value (inside a div), say the date and then do some processing which is not related to HTML.
What is the easiest way to do it ? Should i be using a full fledged HTML parser considering the fact that I want to check only the value of a div element. I don't need the rest of HTML content.
Are there any alternatives to using open source HTML parsers? If HTML parser is inevitable then which is the most light weight parser among them.
Thanks !
PS: I'm not sure if this is the right sub forum;please move it to the correct one.
|
 |
Jaikiran Pai
Marshal
Joined: Jul 20, 2005
Posts: 8209
|
|
I haven't used it but I have heard about HTMLUnit and looking at its getting started guide, it does look simple enough to use. Take a look at their javadoc and look at the HTMLPage, HTMLElement and HTMLDivision classes.
|
[My Blog] [JavaRanch Journal]
|
 |
Anand Athinarayanan
Greenhorn
Joined: May 20, 2011
Posts: 27
|
|
Hi Pai,
I'm very sorry for the late reply. Thank you for your suggestion. Will give it a try.
|
 |
Ratan java
Greenhorn
Joined: Jun 19, 2012
Posts: 5
|
|
|
good solution...
|
 |
 |
|
|
subject: Light Weight HTML parsing
|
|
|