I am currently running a java application starting with 50 threads at a time. Every threads also include numerous page downloads from the web, so at start (when all the 50 threads are running), the process is very fast. When nearly 40 threads finish their operations and only the remaining threads are working....it takes a lot of time ....which also postpones the finish of total process. I think i am wasting the resources towards the end of the process.....Can any one please suggest an idea to overcome this....
Are all web downloads equal? Or are there some painfully slow ones? Meaning... are you sure it's not taking a long time, but you are simply downloading from some web sites that are simply taking a long time?
Every page download is taking approx. 4 seconds from the first, till the end of the process. Moreover every sites are taking same time. The thing happening is, towards the end of process, consider only 3 threads are running, hence only 3 pages are downloaded in 4 seconds (if this is a wise calculation ). But, when we consider the start of the process, approx. 50 threads are running, obviously the page downloaded will be more.....(i guess this kind of calculation is not bad)
Then you can't really do anything about it right? If the last iteration only have 4 web sites to load, then you can't exactly keep 50 threads busy. It's not a resource issue, or an efficiency issue, there is simply no more work left to do.
From what you write it seems to me the problem might lie in inefficient distribution of the work among threads. If there are few trailing threads with lots of requests to do, you could try to distribute the work among all threads. For example, if the requests were all put into a single queue (synchronized, of course), then a thread would get next page to load from that queue and would finish only when there is no more work to do. New requests (if any) would be simply added to the queue.
By the way, queues are already implemented in the Collections framework (see the java.util.Queue interface).
Joined: Sep 10, 2010
yes...I think this can resolve the problem . This implementation of queue would be better, if it is done from the start of the process or from, when the trailing threads have a lot of works to do.....Which will be more efficient...?. Can you also please explain the usage, limitations and concept of Producer-Consumer in Threads?
I'd definitely implement the queue for the whole process, right from start. Create a single queue (probably ConcurrentLinkedQueue, see if it fits your needs) and put all initial requests into it. Pass the queue to individual worker threads. If the worker threads generate new requests (say download some pages referenced from the page it just processed), simply add them to the queue. When the queue is empty, the worker thread quits. The workers' main method might be as simple as this (it's just an illustration, you'll have to refine and expand it to suit your needs):If your threads do not create new requests, it would be even simpler, there would be just the processRequest method called inside the loop.
If you need more info on that design pattern, please use Google. There are lots and lots of descriptions and tutorials on the web, most of them much better than I'd be able to write here. Moreover, this pattern, though very poverfull, is conceptually very simple, don't be afraid of it. The really hard part is to make the queue perform well and thread-safe, and it is already done in JDK. Note that in this specific case, the worker threads would be both producers and consumers.
Joined: Sep 10, 2010
I used this, and my problem is resolved. Now i can finish the entire process with almost the same speed till the end of the process.