• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

HTML parser

 
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all
I am using the HTML parser, but it has some problems as it sometimes extract some of the javascript code as part of the test in the HTML..
Do you know a better parser.

For example when I tried it with "http://www.google.ca/ig?hl=en" it generated that as part of the text

"'; _gel('t6').innerHTML = htmlmsg; } function tarot6() { var prefs = new _IG_Prefs(6); var sign = prefs.getString("sign"); "


Thanks
Maha
 
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the HTML parser ?
 
Maha Hassan
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
this is HTMLParser
[ September 13, 2006: Message edited by: Maha Hassan ]
 
Ulf Dittmer
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Don't know about that one, but JTidy, NekoXNI and TagSoup seem to be more widely used.
 
Maha Hassan
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am now using JTidy
I want to extract the text within the tags the thing is it does not understand things like copyright sign,"-"," " and other special characters and when i change the encoding things do not get better

Anyideas??
 
I carry this gun in case a vending machine doesn't give me my fritos. This gun and this tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic