File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Performance comparison fork/join vs normal loop Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Performance comparison fork/join vs normal loop" Watch "Performance comparison fork/join vs normal loop" New topic

Performance comparison fork/join vs normal loop

Marco Farrugia
Ranch Hand

Joined: Jul 20, 2012
Posts: 34
I wrote the small program below that sums the content of all the elements in an long array and puts the result in the last element of the array. The algorithms was implemented both using Fork/Join and without and the time taken was compared to confirm the performance advantage of fork/join. Surprisingly the fork/join took much longer.

Can someone understand and explain to me why did this happen?

In line 24, I am using an AtomicLongArray, however I also tried using normal Java addition in a synchronized block which took less time, but still more than the normal way. In the latter case I had to use a synchronized block, as without it the result was inconsistent and incorrect due to thread interference.

Below find the output:

And find the code hereunder:

Edward Harned
Ranch Hand

Joined: Sep 19, 2005
Posts: 291

There are so many reasons that it is best to start here: The Calamity

The F/J framework needs to "warm up." There have been discussions about this on the concurrency-interest list. The framework was designed for compiled code. So it takes a while for the JIT compiler to perform.

Then there is the work-stealing problem. All forked tasks go into one deque and all the threads have to go looking for work. SLOW. This is one reason your simple code works faster.

The article lists others.

Ed's latest article: A Java Reactive Generator
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Unnecessary synchronization slows that down.
In each task (which needs to be RecursiveTask not RecursiveAction) compute sum from start to end to local variable (using values from join()ed sub-tasks) and return that as result of the task. Set the last element only at the very end (using result of main task).

But since simple sum of an array is more limited by memory speed than computing power, parallel algorithm won't be faster than simple loop.

Edward Harned
Ranch Hand

Joined: Sep 19, 2005
Posts: 291

The performance boost comes from the size of the array. When you get into millions of elements and many CPUs then F/J is worthwhile (once it warms up.)

You might want to try the TymeacDSE project mentioned in the article. It scatters the array subsets to all the threads using work-sharing. Much, much faster.
Jayesh A Lalwani
Saloon Keeper

Joined: Jan 17, 2008
Posts: 2746

Wow, Edward, that article of yours is a very good read. Thanks for posting that.
I agree. Here's the link:
subject: Performance comparison fork/join vs normal loop
It's not a secret anymore!