Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HTML Parsing

 
Tony Morris
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am attempting to parse http://www.google.com using javax.swing.text.html.HTMLEditrKit.Parser.
I am receiving a ChangedCharSetException because apparantly, the parser can't handle the <meta> tag.
"Googling" reveals a potential workaround to this problem, but I'd prefer not to use it, since it is quite a hack.
http://www.eos.dk/pipermail/advanced-swing/2001-March/000331.html
Does anyone have any better solutions to this problem ?
 
Tony Morris
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I found a solution,
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
cheers.
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24211
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is off-topic here; moving to Swing forum.
 
Sean Sullivan
Ranch Hand
Posts: 427
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For HTML parsing, try this:
http://htmlparser.sourceforge.net/
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic