File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Swing / AWT / SWT and the fly likes HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Swing / AWT / SWT
Bookmark "HTML Parsing" Watch "HTML Parsing" New topic
Author

HTML Parsing

Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I am attempting to parse http://www.google.com using javax.swing.text.html.HTMLEditrKit.Parser.
I am receiving a ChangedCharSetException because apparantly, the parser can't handle the <meta> tag.
"Googling" reveals a potential workaround to this problem, but I'd prefer not to use it, since it is quite a hack.
http://www.eos.dk/pipermail/advanced-swing/2001-March/000331.html
Does anyone have any better solutions to this problem ?


Tony Morris
Java Q&A (FAQ, Trivia)
Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I found a solution,
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
cheers.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

This is off-topic here; moving to Swing forum.


[Jess in Action][AskingGoodQuestions]
Sean Sullivan
Ranch Hand

Joined: Sep 09, 2001
Posts: 427
For HTML parsing, try this:
http://htmlparser.sourceforge.net/
 
 
subject: HTML Parsing