• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

library to parse website with javascript engine

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
His,

I looking for a preferably java-based library to parse a website. The difficulty is, as i see it, that some of the information is actually a displayed through javascript function (triggered by onMouseOver event) and hence would not be part of the html dom (i suppose, have no experience with website parsing) for a parser library to have access to directly.

A possible solution would be to search for the infos with regexps, which may turn out to be complicated, so I am wondering if a better solution would be to use a sometype of java library with javascript support. Does anyone know if such exists?

thanks
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The leading Java libraries for handling web pages (HTMLUnit, jWebUnit) can handle JavaScript. They provide access to the page DOM, and I would assume that it's updated as the script code is executed. But a quick test would make sure either way.
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks for the links. I have taken a quick look, it seems almost the right thing:)
Do you think there is a possibility subvert their purpose of providing html unit tests to parsing static and and javascript-generated html?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not only do I think that - I know so They're even more useful for that purpose in my opinion.
 
PI day is 3.14 (march 14th) and is also einstein's birthday. And this is merely a tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic