File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Logic of HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Logic of HTML Parsing" Watch "Logic of HTML Parsing" New topic

Logic of HTML Parsing

Lalit Nagalkar
Ranch Hand

Joined: Aug 22, 2006
Posts: 47
HI all,

I want to create a class able to parse HTML page and create a tree structure to display all the elements along with their attributes and data (like links to files, or text etc), if any.

I am aware that many have designed this thing.
I don't want complete code, but logic how it's done and some code snipets.

I will be thankfull to all you friends for the help.
I hope you have understood what I mean. For any elaboration pease ask.

Lalit Nagalkar

SCJP 1.4
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
As you said, there are a number of decent libraries available that do this (like jTidy, TagSoup, NekoXNI, ...). The easiest might be to study their approach; I'm sure you'd get a wide range of ideas from that.
Akhilesh Trivedi
Ranch Hand

Joined: Jun 22, 2005
Posts: 1599
In addition to Ulf's comments, you may like to check out this as well.

Keep Smiling Always — My life is smoother when running silent. -paul
[FAQs] [Certification Guides] [The Linux Documentation Project]
I agree. Here's the link:
subject: Logic of HTML Parsing
jQuery in Action, 3rd edition