This week's book giveaway is in the OO, Patterns, UML and Refactoring forum.
We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line!
See this thread for details.
The moose likes Swing / AWT / SWT and the fly likes HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Swing / AWT / SWT
Bookmark "HTML Parsing" Watch "HTML Parsing" New topic
Author

HTML Parsing

Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I am attempting to parse http://www.google.com using javax.swing.text.html.HTMLEditrKit.Parser.
I am receiving a ChangedCharSetException because apparantly, the parser can't handle the <meta> tag.
"Googling" reveals a potential workaround to this problem, but I'd prefer not to use it, since it is quite a hack.
http://www.eos.dk/pipermail/advanced-swing/2001-March/000331.html
Does anyone have any better solutions to this problem ?


Tony Morris
Java Q&A (FAQ, Trivia)
Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I found a solution,
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
cheers.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24189
    
  34

This is off-topic here; moving to Swing forum.


[Jess in Action][AskingGoodQuestions]
Sean Sullivan
Ranch Hand

Joined: Sep 09, 2001
Posts: 427
For HTML parsing, try this:
http://htmlparser.sourceforge.net/
 
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
 
subject: HTML Parsing
 
It's not a secret anymore!