This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Read HTML/Source of external site page

 
Deepak Vadgama
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Howdy Ranchers,

I want to read the HTML content of an external website page (eg: www.yahoo.com) Could you please give me pointers as where to start from.

I want my servlet/container to act as a client, requesting the external website page, and be able to receive file as HTML/Text

I intially toyed with idea of forwarding the request to external site URL, and using Filter to intercept the response and parse the same. But problem with this case is, we cannot forward the request to page external to JVM (i am not sure about this).

Thanks
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stream s = new URL(abc).openConnection() or something like that. Check the API. If you need to 'talk HTTP' then you will need to look at something like Apache's HttpClient. If you want to be responsible, you should look at the behaviour expected of HTTP proxies and act accordingly.
 
Deepak Vadgama
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a lot David for the instant reply...

An open-source project HtmlUnit which is based on Apache HTTPClient
http://htmlunit.sourceforge.net/gettingStarted.html

It has simple API which suffices my requirement

Thanks a lot again
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic