Two Laptop Bag*
The moose likes Java in General and the fly likes HTML parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "HTML parser" Watch "HTML parser" New topic
Author

HTML parser

Maha Hassan
Ranch Hand

Joined: Aug 02, 2005
Posts: 133
Hi all
I am using the HTML parser, but it has some problems as it sometimes extract some of the javascript code as part of the test in the HTML..
Do you know a better parser.

For example when I tried it with "http://www.google.ca/ig?hl=en" it generated that as part of the text
"'; _gel('t6').innerHTML = htmlmsg; } function tarot6() { var prefs = new _IG_Prefs(6); var sign = prefs.getString("sign"); "

Thanks
Maha
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41906
    
  63
What is the HTML parser ?


Ping & DNS - my free Android networking tools app
Maha Hassan
Ranch Hand

Joined: Aug 02, 2005
Posts: 133
this is HTMLParser
[ September 13, 2006: Message edited by: Maha Hassan ]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41906
    
  63
Don't know about that one, but JTidy, NekoXNI and TagSoup seem to be more widely used.
Maha Hassan
Ranch Hand

Joined: Aug 02, 2005
Posts: 133
I am now using JTidy
I want to extract the text within the tags the thing is it does not understand things like copyright sign,"-"," " and other special characters and when i change the encoding things do not get better

Anyideas??
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HTML parser