File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Logic of HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Logic of HTML Parsing" Watch "Logic of HTML Parsing" New topic

Logic of HTML Parsing

Lalit Nagalkar
Ranch Hand

Joined: Aug 22, 2006
Posts: 47
HI all,

I want to create a class able to parse HTML page and create a tree structure to display all the elements along with their attributes and data (like links to files, or text etc), if any.

I am aware that many have designed this thing.
I don't want complete code, but logic how it's done and some code snipets.

I will be thankfull to all you friends for the help.
I hope you have understood what I mean. For any elaboration pease ask.

Lalit Nagalkar

SCJP 1.4
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 41600
As you said, there are a number of decent libraries available that do this (like jTidy, TagSoup, NekoXNI, ...). The easiest might be to study their approach; I'm sure you'd get a wide range of ideas from that.

Ping & DNS - my free Android networking tools app
Akhilesh Trivedi
Ranch Hand

Joined: Jun 22, 2005
Posts: 1526
In addition to Ulf's comments, you may like to check out this as well.

Keep Smiling Always — My life is smoother when running silent. -paul
[FAQs] [Certification Guides] [The Linux Documentation Project]
Don't get me started about those stupid light bulbs.
subject: Logic of HTML Parsing