Something doesn't make sense in my code in terms of 'speed of execution': The computer I'm using is a power mac with 16G ram and 16 cores. The ReadTaskThread simply goes to the DB to retrieve information (SELECT). This returns a list of items and apparently it takes the same time to fetch 10 items or 1000 items (give or take 2 sec).
problem 1: When I use nProcessors=2 cores I get the best performance (execution in 1.2min) . When I use nProcessors=3 cores and more, the execution is 7min + (worst).
problem 2: When the list of items is 1000, the speed to process in the 'consumer' takes some time and as a result the producer has more time to fetch the new data from the db. When the list of item is small - the delay is HUGE because the consumer is waiting for the producer to get the data.
Question 1: I was under the impression that the more utilized processor the better, why this is not the case in the senario I presented?
Question 2: Is newFixedThreadPool is the right one to use?
Also, unless you are using an embedded DB, you may want to look at your database server too. The taking 4 times longer with an extra request in parallel seems weird to me -- unless of course, you are doing more work with the extra request.
And I also don't think you are consuming the data very efficiently. You are consuming the data in the order in which they are submitted, rather than the order in which they return. Perhaps you should change the way the tasks object is used. For example, make it a BlockingQueue implementation, and pass it to both the producer and consumer. The Producer can then put tasks into the queue directly, and the consumer takes tasks in the order they are available which may decrease time spent waiting for a specific producer.
One last thing... you might consider working on the balance between Producer and Consumer as well. Starting with your current thread pool, turn your consumer into a Runnable which gets executed in the pool (takes one Thread from the Producer) and measure the results. Then add a second Consumer, and a third, etc... and see if there is some balance which optimizes performance. If the Consumer's job is processor-intensive, it could be that having multiple running while each producer is waiting on the database may provide better performance. And you don't necessarily have to limit yourself to 16 threads total. If your DB task is talking to a remote DB, and the DB operation takes some time then the Producers aren't using the processors while they wait on the DB - perfect time to allow something else to use the processor, like perhaps a Consumer or another Producer.
New object allocation still may be a difficult task when many cores are used. While with one or two cores it may not be important (even not recommended) to care very much about object reusing, with 10 or more cores "traditionally inappropriate" approaches like object pooling and reusing may help with performance. The best is to allocate all needed objects outside the critical section and do not allocate any new objects in the main loops. We were able to speed up some programs up till several times after fixing these issues.