I wrote the small program below that sums the content of all the elements in an long array and puts the result in the last element of the array. The algorithms was implemented both using Fork/Join and without and the time taken was compared to confirm the performance advantage of fork/join. Surprisingly the fork/join took much longer.
Can someone understand and explain to me why did this happen?
In line 24, I am using an AtomicLongArray, however I also tried using normal Java addition in a synchronized block which took less time, but still more than the normal way. In the latter case I had to use a synchronized block, as without it the result was inconsistent and incorrect due to thread interference.
There are so many reasons that it is best to start here: The Calamity
The F/J framework needs to "warm up." There have been discussions about this on the concurrency-interest list. The framework was designed for compiled code. So it takes a while for the JIT compiler to perform.
Then there is the work-stealing problem. All forked tasks go into one deque and all the threads have to go looking for work. SLOW. This is one reason your simple code works faster.
Unnecessary synchronization slows that down.
In each task (which needs to be RecursiveTask not RecursiveAction) compute sum from start to end to local variable (using values from join()ed sub-tasks) and return that as result of the task. Set the last element only at the very end (using result of main task).
But since simple sum of an array is more limited by memory speed than computing power, parallel algorithm won't be faster than simple loop.