aspose file tools*
The moose likes Java in General and the fly likes Code for crawling websites hangs without any exception Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Code for crawling websites hangs without any exception" Watch "Code for crawling websites hangs without any exception" New topic
Author

Code for crawling websites hangs without any exception

ch pravin
Greenhorn

Joined: Nov 25, 2010
Posts: 20
Hello All,

I have been trying to crawl around 300,000 websites. I try to connect to a website and check for the response code before proceeding with the crawl. I am using HTML Parser 1.6 for the purpose of retrieving data. The code runs fine for some time (varies with each run) and then hangs. It doesn't throw any exception but the program doesn't seem to be doing anything. I am unable to figure what is going wrong in the code. I apologize for the poor quality of the code(code inside catch and having the majority of the code in main ). Any help will be greatly appreciated.

I am attaching the code below:

fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11419
    
  16

put in more System.out.println() statements to find out where it hangs...


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Luigi Plinge
Ranch Hand

Joined: Jan 06, 2011
Posts: 441

ch pravin
Greenhorn

Joined: Nov 25, 2010
Posts: 20
fred rosenberger wrote:put in more System.out.println() statements to find out where it hangs...


I see that it is unable to get the response code,essentially the code hangs at the line huc.getResponseCode(). Is there any way to ping a website for a certain time period and if it's not responding just proceed to the next one?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19720
    
  20

You can use an ExecutorService in combination with Callable and Future. The actual work to be done goes into the Callable. You submit it to the ExecutorService, which returns you a Future. You then call the timed get method on the Future, catching the TimeoutException if it occurs.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
ch pravin
Greenhorn

Joined: Nov 25, 2010
Posts: 20
Rob Spoor wrote:You can use an ExecutorService in combination with Callable and Future. The actual work to be done goes into the Callable. You submit it to the ExecutorService, which returns you a Future. You then call the timed get method on the Future, catching the TimeoutException if it occurs.


Thanks for the reply. I'll look into it!
 
jQuery in Action, 2nd edition
 
subject: Code for crawling websites hangs without any exception