File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

screen scrape

 
dale con
Ranch Hand
Posts: 93
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all ,

can anyone give me an example of screen scraping a website and return the result e.g. html as a string

or lead me to some tutorials / examples

many thanks
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Like TESS?
 
Tom Blough
Ranch Hand
Posts: 263
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dale,

An old one I did a while ago that screenscraped the USPS Zip+4 info is located at http://www.mycgiserver.com/~tblough/screenscrape.htm.

It looks like it doesn't work any more because USPS changed the website from CGI to jsp based pages, but the idea still works. The source is available from a link on the page.

Cheers,
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15150
31
Android IntelliJ IDE Java Scala Spring
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading the content of a webpage is simple enough with class java.net.URL:

After you've done that, you'll have to find the stuff you want to find in the HTML page. You could do it the simple way, with String.indexOf() for example, but maybe that won't be flexible enough.

You could try regular expressions, or you could use a HTML parser to walk through the structure of the HTML and find the text you're looking for. Something like http://htmlparser.sourceforge.net/ might be useful for that.
 
dale con
Ranch Hand
Posts: 93
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
cheers guys for all your help, much appreciated



i know this is a relatively old thing to do but trying to find stuff out there is quite difficult
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic