aspose file tools*
The moose likes Servlets and the fly likes Read HTML/Source of external site page Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Servlets
Bookmark "Read HTML/Source of external site page" Watch "Read HTML/Source of external site page" New topic
Author

Read HTML/Source of external site page

Deepak Vadgama
Greenhorn

Joined: Jul 02, 2007
Posts: 29
Howdy Ranchers,

I want to read the HTML content of an external website page (eg: www.yahoo.com) Could you please give me pointers as where to start from.

I want my servlet/container to act as a client, requesting the external website page, and be able to receive file as HTML/Text

I intially toyed with idea of forwarding the request to external site URL, and using Filter to intercept the response and parse the same. But problem with this case is, we cannot forward the request to page external to JVM (i am not sure about this).

Thanks


-- deepak <br />SCJP 5.0, SCWCD 5.0
David O'Meara
Rancher

Joined: Mar 06, 2001
Posts: 13459

Stream s = new URL(abc).openConnection() or something like that. Check the API. If you need to 'talk HTTP' then you will need to look at something like Apache's HttpClient. If you want to be responsible, you should look at the behaviour expected of HTTP proxies and act accordingly.
Deepak Vadgama
Greenhorn

Joined: Jul 02, 2007
Posts: 29
Thanks a lot David for the instant reply...

An open-source project HtmlUnit which is based on Apache HTTPClient
http://htmlunit.sourceforge.net/gettingStarted.html

It has simple API which suffices my requirement

Thanks a lot again
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Read HTML/Source of external site page