Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Java in General and the fly likes Light Weight HTML parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Light Weight HTML parsing" Watch "Light Weight HTML parsing" New topic
Author

Light Weight HTML parsing

Anand Athinarayanan
Greenhorn

Joined: May 20, 2011
Posts: 27
Hi,

I will be writing a java program which is going to grab the contents of a html webpage and check for a value (inside a div), say the date and then do some processing which is not related to HTML.
What is the easiest way to do it ? Should i be using a full fledged HTML parser considering the fact that I want to check only the value of a div element. I don't need the rest of HTML content.

Are there any alternatives to using open source HTML parsers? If HTML parser is inevitable then which is the most light weight parser among them.

Thanks !

PS: I'm not sure if this is the right sub forum;please move it to the correct one.
Jaikiran Pai
Marshal

Joined: Jul 20, 2005
Posts: 8209
    
  71

I haven't used it but I have heard about HTMLUnit and looking at its getting started guide, it does look simple enough to use. Take a look at their javadoc and look at the HTMLPage, HTMLElement and HTMLDivision classes.

[My Blog] [JavaRanch Journal]
Anand Athinarayanan
Greenhorn

Joined: May 20, 2011
Posts: 27
Hi Pai,

I'm very sorry for the late reply. Thank you for your suggestion. Will give it a try.
Ratan java
Greenhorn

Joined: Jun 19, 2012
Posts: 5
good solution...
 
I agree. Here's the link: http://zeroturnaround.com/jrebel - it saves me about five hours per week
 
subject: Light Weight HTML parsing
 
Similar Threads
<div> question
Help convert rails 2 snippet to rails 3
How would be the best way to parse HTML Content ?
Extracting text from html using htmlparser
how to parse html using xpath to extract data