File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes screen scrape Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "screen scrape" Watch "screen scrape" New topic

screen scrape

dale con
Ranch Hand

Joined: Apr 15, 2005
Posts: 93
hi all ,

can anyone give me an example of screen scraping a website and return the result e.g. html as a string

or lead me to some tutorials / examples

many thanks
marc weber

Joined: Aug 31, 2004
Posts: 11343

Like TESS?

"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer
Tom Blough
Ranch Hand

Joined: Jul 31, 2003
Posts: 263

An old one I did a while ago that screenscraped the USPS Zip+4 info is located at

It looks like it doesn't work any more because USPS changed the website from CGI to jsp based pages, but the idea still works. The source is available from a link on the page.


Tom Blough<br /> <blockquote><font size="1" face="Verdana, Arial">quote:</font><hr>Cum catapultae proscriptae erunt tum soli proscripti catapultas habebunt.<hr></blockquote>
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 15081

Reading the content of a webpage is simple enough with class

After you've done that, you'll have to find the stuff you want to find in the HTML page. You could do it the simple way, with String.indexOf() for example, but maybe that won't be flexible enough.

You could try regular expressions, or you could use a HTML parser to walk through the structure of the HTML and find the text you're looking for. Something like might be useful for that.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
dale con
Ranch Hand

Joined: Apr 15, 2005
Posts: 93
cheers guys for all your help, much appreciated

i know this is a relatively old thing to do but trying to find stuff out there is quite difficult
I agree. Here's the link:
subject: screen scrape
jQuery in Action, 3rd edition