File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JSP and the fly likes SCRAPE Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » JSP
Bookmark "SCRAPE" Watch "SCRAPE" New topic
Author

SCRAPE

Aaron O'Brien
Ranch Hand

Joined: May 24, 2002
Posts: 89
Just wondering if anyone has any information they can pass along about scraping information from web sites like stock quotes or movie listing for example. I am interested to see what it would take.
Thanks in advance.


Aaron O'Brien
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 60774
    
  65

You can pretty easily "suck in" the HTML at any URL via java.net.URLConnection.
From there you can either parse the HTML (google for various HTML parsers) or simply use some regular expression searches depending upon your needs.
bear
P.S. There are ethical considerations to screen scraping, so please be sure not to use any "scraped" data in a way other than the intent of the original publishers.


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Aaron O'Brien
Ranch Hand

Joined: May 24, 2002
Posts: 89
Thanks Bear,
I will be sure to give this a try...and to use my program wisely.
Take Care.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Once you are pulling content from other sites, the Quiotix Parser may make understanding it easier.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Simon Brown
sharp shooter, and author
Ranch Hand

Joined: May 10, 2000
Posts: 1913
    
    6
Also check out the Scrape Tag Library.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: SCRAPE
 
Similar Threads
Tech Word Game
*DETECTED* Online User Violation NOT from JavaRanch
Me and C#
Game Developer needed!
Javaranch's image discussion