This week's book giveaway is in the OCMJEA forum. We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line! See this thread for details.
I have an ExecutorService which I'm using to solve a recursive problem. Initially I call submit on this pool passing to it the entire task it has to solve. The task class implements Callable and its method call
makes some calculations and them breaks the problem into two pieces, which I want to solve using other threads on the same pool. So in the task constructor I pass a reference to the same pool so its call method
can call submit on the same pool so the two pieces can be solved by threads on the pool as well. My problem is that I want to call "get" to wait for the results. Obviously, if I have just one thread on the pool, for example,
it will call submit on the pool and then get, but it will wait forever since there are no other threads to run that task it submitted.
So, the bigger my problem is, the more I'll need threads in my pool. I have "solved" the problem by using a CachedThreadPool, which gave me another problem: it creates threads on demand which is not what I want either.
I want a pool with a fixed number of threads. Can you give a suggestion?
Let's take a merge-sort as a typical example of this problem. Here's an implementation that looks reasonable, but is broken:
Since sorting the two halves of the array are disjoint problems, it stands to reason we can sort them concurrently; the first half as a new task for the thread pool, the second half by the current thread. When the current thread is done sorting, it waits for the thread pool to finish sorting the first half, and then it merges the two halves.
There's a problem with this approach though. Since the sorting is done recursively, each call will submit half of its part of the array to the thread pool. Sorting an array of length 8 means submitting 7 tasks to the thread pool. If the pool never has more than 6 threads available, it's likely the program will deadlock because threads in the pool will wait for deeper tasks to complete, which will never complete because there are no threads in the pool available to execute them.
Instead, you should tell the method the maximum amount of tasks it may submit, and halve the number for each recursive call. If a method call may not submit any more tasks to the thread pool, it should sort both halves in the current thread. Another solution is to communicate with the thread pool in a synchronized way to find out if there are idle threads that can execute a new task, but this may actually end up slowing down the entire process.
Disclaimer: I didn't compile or test any of the code in this post, so there may be loads of errors.
Here is some tested code that performs merge-sort using multiple threads. You specify a file, and optionally the number of threads and an insertion-sort threshold.
The program breaks the file up in tokens, sorts them and determines how long it took. You can specify the minimum size that sub-arrays need to be in order to use merge-sort, else insertion-sort is used; a threshold of 1 means a pure merge-sort.
On my system, using 8 threads and an insertion threshold of 16, it takes around 250 ms to sort about a million strings. This is almost twice as fast as using Arrays.sort() to sort the same tokens.