• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Code for crawling websites hangs without any exception

 
ch pravin
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,

I have been trying to crawl around 300,000 websites. I try to connect to a website and check for the response code before proceeding with the crawl. I am using HTML Parser 1.6 for the purpose of retrieving data. The code runs fine for some time (varies with each run) and then hangs. It doesn't throw any exception but the program doesn't seem to be doing anything. I am unable to figure what is going wrong in the code. I apologize for the poor quality of the code(code inside catch and having the majority of the code in main ). Any help will be greatly appreciated.

I am attaching the code below:

 
fred rosenberger
lowercase baba
Bartender
Posts: 12143
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
put in more System.out.println() statements to find out where it hangs...
 
Luigi Plinge
Ranch Hand
Posts: 441
IntelliJ IDE Scala Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
ch pravin
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
fred rosenberger wrote:put in more System.out.println() statements to find out where it hangs...


I see that it is unable to get the response code,essentially the code hangs at the line huc.getResponseCode(). Is there any way to ping a website for a certain time period and if it's not responding just proceed to the next one?
 
Rob Spoor
Sheriff
Pie
Posts: 20546
56
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use an ExecutorService in combination with Callable and Future. The actual work to be done goes into the Callable. You submit it to the ExecutorService, which returns you a Future. You then call the timed get method on the Future, catching the TimeoutException if it occurs.
 
ch pravin
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:You can use an ExecutorService in combination with Callable and Future. The actual work to be done goes into the Callable. You submit it to the ExecutorService, which returns you a Future. You then call the timed get method on the Future, catching the TimeoutException if it occurs.


Thanks for the reply. I'll look into it!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic