File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Logic of HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Logic of HTML Parsing" Watch "Logic of HTML Parsing" New topic
Author

Logic of HTML Parsing

Lalit Nagalkar
Ranch Hand

Joined: Aug 22, 2006
Posts: 47
HI all,

I want to create a class able to parse HTML page and create a tree structure to display all the elements along with their attributes and data (like links to files, or text etc), if any.

I am aware that many have designed this thing.
I don't want complete code, but logic how it's done and some code snipets.

I will be thankfull to all you friends for the help.
I hope you have understood what I mean. For any elaboration pease ask.

Thanks.
Lalit Nagalkar


SCJP 1.4
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41600
    
  55
As you said, there are a number of decent libraries available that do this (like jTidy, TagSoup, NekoXNI, ...). The easiest might be to study their approach; I'm sure you'd get a wide range of ideas from that.


Ping & DNS - my free Android networking tools app
Akhilesh Trivedi
Ranch Hand

Joined: Jun 22, 2005
Posts: 1526
In addition to Ulf's comments, you may like to check out this as well.


Keep Smiling Always — My life is smoother when running silent. -paul
[FAQs] [Certification Guides] [The Linux Documentation Project]
 
Don't get me started about those stupid light bulbs.
 
subject: Logic of HTML Parsing