Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

screen scrape

 
dale con
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all ,

can anyone give me an example of screen scraping a website and return the result e.g. html as a string

or lead me to some tutorials / examples

many thanks
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Like TESS?
 
Tom Blough
Ranch Hand
Posts: 263
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dale,

An old one I did a while ago that screenscraped the USPS Zip+4 info is located at http://www.mycgiserver.com/~tblough/screenscrape.htm.

It looks like it doesn't work any more because USPS changed the website from CGI to jsp based pages, but the idea still works. The source is available from a link on the page.

Cheers,
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15281
39
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading the content of a webpage is simple enough with class java.net.URL:

After you've done that, you'll have to find the stuff you want to find in the HTML page. You could do it the simple way, with String.indexOf() for example, but maybe that won't be flexible enough.

You could try regular expressions, or you could use a HTML parser to walk through the structure of the HTML and find the text you're looking for. Something like http://htmlparser.sourceforge.net/ might be useful for that.
 
dale con
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
cheers guys for all your help, much appreciated



i know this is a relatively old thing to do but trying to find stuff out there is quite difficult
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic