• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

JSP Crawler

 
Shashank Agarwal
Ranch Hand
Posts: 105
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey everyone. I was trying to build a crawler, a search engine type crawler to crawl web pages and create reports from it. Well, this crawler will be a JSP. Now, the problem is that how do i make it follow a link. Lets say it href="http://www.javaranch.com" then its ok, and i can get the substring between the two double quotes. However, if the link is to an internal page, then most pages have it as href="page2.html" or "../page2.html". Here how do i make the crawler go to the page2.html

I hope I'm able to put across my problem.

Thanks in advance.
 
Pritam Barhate
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See the artical Create intelligent Web spiders at JavaWorld.
 
Sonny Gill
Ranch Hand
Posts: 1211
IntelliJ IDE Mac
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And there is a chapter (or two) on it in The Art Of Java book.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13058
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is no real reason to make this a JSP since you are going to end up with huge amounts of computation and data. Why not work on the guts of the crawler as a stand-alone application until you get it working right.

Trying to do it in JSP code will just be confusing and hard to debug.
Bill
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic