File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Servlets and the fly likes Reading web page from servlet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Servlets
Bookmark "Reading web page from servlet" Watch "Reading web page from servlet" New topic

Reading web page from servlet

William EGreen

Joined: Mar 18, 2002
Posts: 2
How do I get the HTML text of a given web page from a servlet? (i.e. I need to do some data mining. Also note that the web page in question could require a cookie. I have access to the cookie and can send it to the servlet.)
Bill Green
Jessica Sant

Joined: Oct 17, 2001
Posts: 4313

you could use a java program that access the website, make a request, and writes teh response to a file (thus saving the resulting HTML code).
You might be able to adapt the code from HttpUnit to do just that. It's mean to be a web site Unit testing suite, but you could use it to store the data in the page rather than validating it.
It's an open source project available here:
Hope that helps.

- Jess
Blog:KnitClimbJava | Twitter: jsant | Ravelry: wingedsheep
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63852

Check out URLConnection.

[Asking smart questions] [About Bear] [Books by Bear]
Kripal Singh
Ranch Hand

Joined: Jul 26, 2001
Posts: 254
Try using following code

# Help an unprivileged kid.<br /> Whatever u do will make a difference...<br /> a child's life & ur own #<br /><a href="" target="_blank" rel="nofollow"></a>
I agree. Here's the link:
subject: Reading web page from servlet
It's not a secret anymore!