jQuery in Action, 2nd edition*
The moose likes Threads and Synchronization and the fly likes Web Crawler - why it needs thread ?? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Web Crawler - why it needs thread ??" Watch "Web Crawler - why it needs thread ??" New topic
Author

Web Crawler - why it needs thread ??

marlajee Borstone
Ranch Hand

Joined: Jun 26, 2008
Posts: 35
Hello friends, how do you do ?

Now-a-days, I am working on web crawler . However, I am too weak in Threading

Here is the code:



I understood all the spects of this code, except the use of Thread.
Why have we used here Thread ?? I can see the main task, which a web crawler need to do, is written here inside the run() method.
is this because this run method will handle multiple threads concurrently ??

If so, can somebody help to know how can I write a code(a separate java file) which would have multiple thread to run on this run() method of Webcrawler2 and utilize the use of thread for this class ??

I would appreciate higly for this help....common friends.. help me out...

- Dhansumaal
[ September 24, 2008: Message edited by: marlajee Borstone ]
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24183
    
  34

Please don't do this. Seriously.


[Jess in Action][AskingGoodQuestions]
marlajee Borstone
Ranch Hand

Joined: Jun 26, 2008
Posts: 35
Sorry Ernest, I could not get you.....

-Dhansumaal
[ September 23, 2008: Message edited by: marlajee Borstone ]
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18754
    
  40

Why have we used here Thread ?? I can see the main task, which a web crawler need to do, is written here inside the run() method.
is this because this run method will handle multiple threads concurrently ??


Actually, no. This run() method will be called from a thread, that enables it to run concurrently with the thread that started it. It's the Thread class will deal with the the threads -- not the run() method. The run() method is the code that the thread will run.

As for how this is done, you need to examine the code that instantiates and starts the class whose code you have shown.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
marlajee Borstone
Ranch Hand

Joined: Jun 26, 2008
Posts: 35
Hi Henry, thank you verymuch for your quick and helpful response. But it seems I am still away from the exact reply which I am lookign for...... sorry to bother you again..but I really need help on it....

This run() method will be called from a thread, that enables it to run concurrently with the thread that started it. It's the Thread class will deal with the the threads -- not the run() method. The run() method is the code that the thread will run.


Yes, this I can understand from my above code also. There 'searchThread' which is responsible to enable the run() method to execute.

But, as my above code is just an utility as it does not have any Main method. So when I write a class like:

I can see it starts Wcrawler2 and crawls all the URLS starting from my Tomcat server's index page.
BUT, it is just a single processing without use of thread. HOWEVER, I want to utilize the thread which is used in Wcrawler2. So that when two concurrent serch request comes with two different URLs, it could handle concurrently.
For that please suggest me how can I modify my above Test class ??
looking forward for you suggestion.......

~Dhansumal
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18754
    
  40

I can see it starts Wcrawler2 and crawls all the URLS starting from my Tomcat server's index page.
BUT, it is just a single processing without use of thread. HOWEVER, I want to utilize the thread which is used in Wcrawler2. So that when two concurrent serch request comes with two different URLs, it could handle concurrently.
For that please suggest me how can I modify my above Test class ??


First of all, I really think you should get started on learning threads. Threads is not something you can just add to your application, without a clear understanding of how it works. You can start with the Sun Threads Tutorial....

http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html

Now to answer your question... If you look at your test class (by putting a print statement after the startSearch() method call), you will see that the search is actually done concurrently, as the main() method returns from the search call and exits.

This means that the main thread (running the main method) can create another instance, and start another search. It is just that your test code stops after the first one.

Henry
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18754
    
  40

Originally posted by marlajee Borstone:
Sorry Ernest, I could not get you.....

-Dhansumaal


I think what EFH is trying to say -- and I totally agree, is webcrawling causes huge problems for websites, and should be left to the professionals.

Even google can overwhelm a site, and basically become similar to doing a DOS attack, when it is webcrawling.

If you are not careful, you can find yourself being blocked from the sites that you are trying to scrape data from.

Henry
marlajee Borstone
Ranch Hand

Joined: Jun 26, 2008
Posts: 35
Thank you Henry for this clarification as well as for previous reply.
it is really very helpful.

~Dhansumaal
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Web Crawler - why it needs thread ??