This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading web page from servlet

 
William EGreen
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I get the HTML text of a given web page from a servlet? (i.e. I need to do some data mining. Also note that the web page in question could require a cookie. I have access to the cookie and can send it to the servlet.)
Thanks,
Bill Green
 
Jessica Sant
Sheriff
Posts: 4313
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you could use a java program that access the website, make a request, and writes teh response to a file (thus saving the resulting HTML code).
You might be able to adapt the code from HttpUnit to do just that. It's mean to be a web site Unit testing suite, but you could use it to store the data in the page rather than validating it.
It's an open source project available here:
http://httpunit.sourceforge.net/
Hope that helps.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64708
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out URLConnection.
hth,
bear
 
Kripal Singh
Ranch Hand
Posts: 254
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try using following code
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic