jQuery in Action, 3rd edition
The moose likes JSP and the fly likes JSP Crawler Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » JSP
Bookmark "JSP Crawler" Watch "JSP Crawler" New topic

JSP Crawler

Shashank Agarwal
Ranch Hand

Joined: May 20, 2004
Posts: 105
Hey everyone. I was trying to build a crawler, a search engine type crawler to crawl web pages and create reports from it. Well, this crawler will be a JSP. Now, the problem is that how do i make it follow a link. Lets say it href="http://www.javaranch.com" then its ok, and i can get the substring between the two double quotes. However, if the link is to an internal page, then most pages have it as href="page2.html" or "../page2.html". Here how do i make the crawler go to the page2.html

I hope I'm able to put across my problem.

Thanks in advance.
Pritam Barhate

Joined: Nov 25, 2004
Posts: 15
See the artical Create intelligent Web spiders at JavaWorld.

Pritam Barhate<br />A magic combination of <b>Code</b> & <b>Fire</b> : <a href="http://www.jroller.org/page/codefire/Weblog" target="_blank" rel="nofollow">codefire</a><br />----------------------------------- <br />My Open Source Projects:<br /><a href="https://acemdi.dev.java.net/" target="_blank" rel="nofollow">AceMDI</a>: A easy, yet powerful MDI framework that manages windows as Tabs.
Sonny Gill
Ranch Hand

Joined: Feb 02, 2002
Posts: 1211

And there is a chapter (or two) on it in The Art Of Java book.
William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13037
There is no real reason to make this a JSP since you are going to end up with huge amounts of computation and data. Why not work on the guts of the crawler as a stand-alone application until you get it working right.

Trying to do it in JSP code will just be confusing and hard to debug.
I agree. Here's the link: http://aspose.com/file-tools
subject: JSP Crawler
It's not a secret anymore!