jQuery in Action, 3rd edition
The moose likes Swing / AWT / SWT and the fly likes HTML Parsing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Swing / AWT / SWT
Bookmark "HTML Parsing" Watch "HTML Parsing" New topic

HTML Parsing

Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I am attempting to parse http://www.google.com using javax.swing.text.html.HTMLEditrKit.Parser.
I am receiving a ChangedCharSetException because apparantly, the parser can't handle the <meta> tag.
"Googling" reveals a potential workaround to this problem, but I'd prefer not to use it, since it is quite a hack.
Does anyone have any better solutions to this problem ?

Tony Morris
Java Q&A (FAQ, Trivia)
Tony Morris
Ranch Hand

Joined: Sep 24, 2003
Posts: 1608
I found a solution,
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
Ernest Friedman-Hill
author and iconoclast

Joined: Jul 08, 2003
Posts: 24199

This is off-topic here; moving to Swing forum.

[Jess in Action][AskingGoodQuestions]
Sean Sullivan
Ranch Hand

Joined: Sep 09, 2001
Posts: 427
For HTML parsing, try this:
I agree. Here's the link: http://aspose.com/file-tools
subject: HTML Parsing
It's not a secret anymore!