• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

JSP Crawler

 
Ranch Hand
Posts: 105
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey everyone. I was trying to build a crawler, a search engine type crawler to crawl web pages and create reports from it. Well, this crawler will be a JSP. Now, the problem is that how do i make it follow a link. Lets say it href="http://www.javaranch.com" then its ok, and i can get the substring between the two double quotes. However, if the link is to an internal page, then most pages have it as href="page2.html" or "../page2.html". Here how do i make the crawler go to the page2.html

I hope I'm able to put across my problem.

Thanks in advance.
 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See the artical Create intelligent Web spiders at JavaWorld.
 
Ranch Hand
Posts: 1211
Mac IntelliJ IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And there is a chapter (or two) on it in The Art Of Java book.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is no real reason to make this a JSP since you are going to end up with huge amounts of computation and data. Why not work on the guts of the crawler as a stand-alone application until you get it working right.

Trying to do it in JSP code will just be confusing and hard to debug.
Bill
 
I love a good mentalist. And so does this tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!