Hey everyone. I was trying to build a crawler, a search engine type crawler to crawl web pages and create reports from it. Well, this crawler will be a JSP. Now, the problem is that how do i make it follow a link. Lets say it href="http://www.javaranch.com" then its ok, and i can get the substring between the two double quotes. However, if the link is to an internal page, then most pages have it as href="page2.html" or "../page2.html". Here how do i make the crawler go to the page2.html
There is no real reason to make this a JSP since you are going to end up with huge amounts of computation and data. Why not work on the guts of the crawler as a stand-alone application until you get it working right.
Trying to do it in JSP code will just be confusing and hard to debug. Bill
I love a good mentalist. And so does this tiny ad: