aspose file tools*
The moose likes Java in General and the fly likes Creating HTML Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Creating HTML Parser" Watch "Creating HTML Parser" New topic
Author

Creating HTML Parser

harshada patil
Ranch Hand

Joined: Mar 12, 2011
Posts: 96
As a my final year project i created Web Browser, but with the help of third party parser, so as a part of further development i want to write a HTML parser in java.. But i don't know how to proceed.. Please help me with this.
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10403
    
    8

Have you listed all the features your parser should have?

[How to ask questions] [Donate a pint, save a life!] [Onff-turn it on!]
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

I'd start by checking some existing open source Java HTML parsers. If you can't use them directly in your code, you can at least check out how they've done it.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Maneesh Godbole
Saloon Keeper

Joined: Jul 26, 2007
Posts: 10403
    
    8

Um..wouldn't that defeat the purpose of writing your own parser?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

I'm not saying the entire code should be copy-pasted, but it could be used for hints on how to do it. For something like this, I wouldn't completely reinvent the wheel, not even as part of a school project.
harshada patil
Ranch Hand

Joined: Mar 12, 2011
Posts: 96
I want to try developing my own parser, HTML parser with minimum functionality and then moving towards advanced functionality.. but i'm not getting how i can proceed, or what first move i should take
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

First of all you need to know how parsers work in general. Next you need to decide whether you want to do strict parsing (i.e. reject anything which doesn't conform to the HTML spec) or lenient parsing (i.e. accept anything which vaguely resembles HTML). Then you need a grammar for whatever you decided there. Finally you need to write a parser based on that grammar.
harshada patil
Ranch Hand

Joined: Mar 12, 2011
Posts: 96
Thanks Paul..

Please suggest me some material regarding how parser work in general, because i searched for it and it is hard to get it..
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Am I correct in guessing that you know approximately nothing about parsers? Then start with the Wikipedia article: Parser.
harshada patil
Ranch Hand

Joined: Mar 12, 2011
Posts: 96
I'm an software engineer, and i know about parser ( concept i learned from compiler construction course), as a theoretical part, i know how to create grammar too, but i never tried for designing parser before so..
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Okay, then it shouldn't be a problem. Just be aware that it's going to be a significant amount of work, so asking vague and general questions on forums is unlikely to advance that process.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Creating HTML Parser