• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

Logic of HTML Parsing

Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HI all,

I want to create a class able to parse HTML page and create a tree structure to display all the elements along with their attributes and data (like links to files, or text etc), if any.

I am aware that many have designed this thing.
I don't want complete code, but logic how it's done and some code snipets.

I will be thankfull to all you friends for the help.
I hope you have understood what I mean. For any elaboration pease ask.

Lalit Nagalkar
Posts: 43016
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As you said, there are a number of decent libraries available that do this (like jTidy, TagSoup, NekoXNI, ...). The easiest might be to study their approach; I'm sure you'd get a wide range of ideas from that.
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In addition to Ulf's comments, you may like to check out this as well.
So you made a portal in time and started grabbing people. This tiny ad thinks that's rude:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    Bookmark Topic Watch Topic
  • New Topic