Meaningless Drivel is fun!
The moose likes Servlets and the fly likes Reading web page from servlet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Servlets
Bookmark "Reading web page from servlet" Watch "Reading web page from servlet" New topic

Reading web page from servlet

William EGreen

Joined: Mar 18, 2002
Posts: 2
How do I get the HTML text of a given web page from a servlet? (i.e. I need to do some data mining. Also note that the web page in question could require a cookie. I have access to the cookie and can send it to the servlet.)
Bill Green
Jessica Sant

Joined: Oct 17, 2001
Posts: 4313

you could use a java program that access the website, make a request, and writes teh response to a file (thus saving the resulting HTML code).
You might be able to adapt the code from HttpUnit to do just that. It's mean to be a web site Unit testing suite, but you could use it to store the data in the page rather than validating it.
It's an open source project available here:
Hope that helps.

- Jess
Blog:KnitClimbJava | Twitter: jsant | Ravelry: wingedsheep
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63346

Check out URLConnection.

[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Kripal Singh
Ranch Hand

Joined: Jul 26, 2001
Posts: 254
Try using following code

# Help an unprivileged kid.<br /> Whatever u do will make a difference...<br /> a child's life & ur own #<br /><a href="" target="_blank" rel="nofollow"></a>
I agree. Here's the link:
subject: Reading web page from servlet
It's not a secret anymore!