aspose file tools*
The moose likes Threads and Synchronization and the fly likes Concurrentlinkedqueue fails to add data in multi-threaded environment. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Concurrentlinkedqueue fails to add data in multi-threaded environment." Watch "Concurrentlinkedqueue fails to add data in multi-threaded environment." New topic
Author

Concurrentlinkedqueue fails to add data in multi-threaded environment.

Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
I am using concurrentlinkedqueue which is shared across multiple threads which are started from a daemon thread.
Whenever thread processes it adds data in queue.

Maximum data that can exist in queue is 8000 at any point of time as processing is batched.

So assume there are 5 daemon threads which are continuously running in while true loop. Each daemon thread has private instance variable of concurrentlinkedqueue.
Each daemon thread fires maximum 50 threads (using ExecutorService). Each thread adds the data to the concurrentlinkedqueue which is passed to it, so in short concurrentlinkedqueue is shared between 50threads.
Daemon thread waits for all 50 threads to complete, before going back to start of while(true) loop. Please note queue is cleared down before it again reaches start of while(true) loop.

Now all this is running continuously and billions of records are getting processed.
In extreme rare instances data did not add to queue which was getting processed in threads. (Rare here means 3 records out of 4 billion).
There is no exception and exceptions are caught upto throwable clause in threads.

Is there any known bug in ConcurrentLinkedqueue during add or offer operation, which might fail in multi-threaded environment?
We are using Weblogic 12c as application server
JDK 1.6 Update 33.
Luan Cestari
Ranch Hand

Joined: Feb 07, 2010
Posts: 163

It seems to be an issue. I already got some similar issue. The main problem is to search the bugs in the Oracle's database about its JVM. You can try jRockit (as it was built by BEA, it might not have this bug) . I would also recommend to open an bug fix request on the oracle.


Please, visit me for some cool tech post at www.ourdailycodes.com
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Thanks Luan, I tried to search a lot but could not find. There is bug reported about remove but not of add. Can you help me provide me with the link for this bug.
Luan Cestari
Ranch Hand

Joined: Feb 07, 2010
Posts: 163

You mean the similar issue? I will have very easy the link which problem it was (I working solving tons of problems in my work). We could try to do like in that bug of remove and create a reproducer of the issue (link http://bugs.sun.com/view_bug.do?bug_id=6493942 ).
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
I am aware of this bug. What me facing is a very basic fundamental of concurrentlinkedqueue. It is designed to add elements in multiple threads. But it fails one in billion iterations. It takes me days to reproduce this. So can java make such basic mistake.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Anything is possible so it could be a bug in the JRE (it is software so there can be bugs). First thing to do would be to test on a current version of Java and see if you get the same behavior. If so, I would first I would assume it was a bug in my code. To troubleshoot I would first try to make the chance of producing the problem quicker. Create a test application that does what you think has the bug in it, and isolate it from the main application. If the problem occurs 1 in a billion entries, is it total number of entries? Can you make it do nothing but add entries? How fast can you get to 1 or 2 billion additions? Is it caused by rare Thread collisions? How about using more threads? Or running on a lower-resource-system? Or running on a higher-resource system? Can any of that make it happen quicker? Then try to track exactly where the problem is: can you detect when the count of objects in the queue is wrong? If so create thread dumps and for all threads then shuffle through them to see what is going on. Or can you detect when it went wrong and then make it right (adding value back in...).


Steve
Luan Cestari
Ranch Hand

Joined: Feb 07, 2010
Posts: 163

The problem can also occurs only in some OS specific, the 64bits32bits architecture can also influence.
Luan Cestari
Ranch Hand

Joined: Feb 07, 2010
Posts: 163

I and my girlfriend made a post about this problem and we create a project on github to simulate it. We tried a lot but we didn`t find the problem =/ Here is the post for you take a look (maybe you could use the code to try in your environment) -> http://www.ourdailycodes.com/2013/09/is-it-bug-on-jvm-16.html
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Thanks all for your help, I have written a small prototype on similar lines and will be running on windows 64 bit system. I am going to run this for several days and let you know if I encounter this problem.
This problem is extremely rare, so really can't say if it's jvm, OS, weblogic or may be code.
But probability of code being an issue looks negligeable, because load is the same on all days as it is on performance testing environment and transactions are loading 24 x 7 at same pace using a tool.
It has occurred only 3 times in period of 1 month with same constant load as mentioned above.



Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

Do you mind posting your prototype here? Just having some eyeballs on your code might help.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1


In the above code, if I have this below line printed, it means there is bug
System.out.println("queue size" +q.size());
What happened in our system was 3 times the q.size turned out to be 7999 instead of 8000, so it was only 1 less than actual size.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

Please use code tags while posting code. I have edited the post for you

The bit about the counter seems a little confusing to me. It just might be by tiny brain but I think there's simpler ways of figuring out whether all your Callables have completed. I am wondering whether the problem is a bug in the counter that causes the queue t go to 8001 in one iteration, and 7999 in the next. Can you try simplifying this

>
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
counter cannot be an issue, because you look at the code. It is decremented before it is passed to the thread and is incremented in the finally block of the thread which confirms if callable has completed.
There is a check below that
So it waits till all callables are complete.
So there is no way that it could add extra or less value in the queue.
I am running this prototype and it's running fine for last 4 hours, with no issues so far.
I am going to give this a try few more days to see if I can find anything.

I am not really hopeful that I will be able to pin down the problem as java bug, OS issue, weblogic or code.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

I think your tracking method is wrong. You are relying on two independent states tracking appropriately in order. That is, you assume that is counter.get() returns a value equal to or greater than batchsize, it means that the last thread added the item to the queue, so the queue size should be the same or larger than batchsize. That is guaranteed for single-threaded operation, but is not true for intra-thread operation. It is quite legal that the counter could have its value published to other threads before the queue has its size published to other threads. If you need the counter and the queue size to behave so they are consistent they need to be kept together as an atomic unit (synchronized).
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
It is atomicinteger. So it shares same value across all threads. At the end i check its same as batch size. Whats wrong here?
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
They are together. Its addition to queue first and then atomic int is increased.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

Are you sure there is no exception in your callable? It seems like if you are adding a billion elements to the queue, you might be running out of memory
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Elements added in queue is max 8000 then its cleared. If there is exception i can find the root cause but unfortunately there is no exception in callable.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Manoj Kulkarni wrote:They are together. Its addition to queue first and then atomic int is increased.


Read up on the memory model. The textual sequence visibility is only guaranteed in a single thread. In different threads there is no guarantee that the events, though happens-before in the setting thread, is visibly happens-before in other threads, unless there is a shared synchronization barrier. The AtomicInteger and ConcurrentLinkedQueue both have synchronization promises, but they use different means and different barriers, and so there is no happens-before guarantee in their interaction.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

I should have linked to the memory model: http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4

The first example (17.4-1) demonstrates the problem. Section 17.4.5 describes what is required for the 'AtomicInteger value write' to have a happens-before relationship with the 'ConcurrentLinkedQueue size read'. There are a couple of simple rules which must apply and I don't see any of them applying your application to enforce write-read consistency. (note, the AtomInteger value write will have a happens before relationship with the AtomicInteger value read, and the Queue's size write will have a happens before relationship with the Queue's size read, but there is nothing to prevent reordering from breaking inter-thread consistency between the two unrelated reads and writes).
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Just to clear my understanding here. Refer my queuethread class. It is a thread so in that 1st operation is add and next operation is atomicinteger increment. Now when 100 threads are fired parallely, you mean to say that there is possibility of add queue later and atomic int increment first in same thread. Because please note that as part of operation add queue and increment atomic int are sequential and not parallel activities.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Manoj Kulkarni wrote:Just to clear my understanding here. Refer my queuethread class. It is a thread so in that 1st operation is add and next operation is atomicinteger increment. Now when 100 threads are fired parallely, you mean to say that there is possibility of add queue later and atomic int increment first in same thread. Because please note that as part of operation add queue and increment atomic int are sequential and not parallel activities.

I am saying that, in the main thread it is possible that the AtomicInteger increment can be seen as happening before the element is added to the Queue because none of the conditions that force ordering are in place. Whether they can or are reordered in a single thread is immaterial - but I think it would be legal to do so.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Thanks Steve, that was a great piece of information and now it make sense why one entry could be missing. But extremely difficult to reproduce these cases so have to go by the article you shared.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

If you just want to wait till all the submitted callable s are done, you can
a) either shutdown the executor and await termination,
Or
B) if you want to keep the executor running, use the future returned by submit method to check the status of the callable.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
I have a new concern now. Considering the theory of memory model, queue.add should get executed under any circumstances may be after atomicinteger has incremented.
But there is no evidence of queue.add call being made but there is evidence of atomicinteger increment. Is it possible to miss the entire call in above prototype.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Also one very basic question. If you check the code atomicinteger increment is in finally block and queue.add is in try block. So can java reorder to execute code in finally block before try block. Is it technically possible as per java specifications? Java does not execute finally before try, so re-ordering and memory model analysis looks to be irrelevant in this case.

http://docs.oracle.com/javase/tutorial/essential/exceptions/finally.html
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2402
    
  28

According to the Java specs, within a thread, the JVM has make sure that execution order is same as program order within a thread. The JVM optimizer cannot make the finally block execute before the try block the Java specs say so

The problem is that the Java spec doesn't guarantee that other threads "see" the changes in the same order. The JVM can change the order in which other threads see the changes. So, even if you added an object to the queue and then updated the counter in one thread, other thread might see the update to counter before seeing the update to the queue. It's counter intuitive, so it;s hard to wrap your mind around it.

The only way you can make sure that the order is same is by introducing a shared synchronization barrier. Let's say you had a acquired a lock in Thread A, and Thread B tried to acquire the same lock and went into wait state. Now, Thread A modifies some variables and then releases the lock. When Thread B acquires that lock, the JVM will have to ensure that Thread B can see all the changes made by Thread A before it released the lock.

Another way of ensuring order is by using volatile variables. the JVM has to ensure that any writes to the volatile variable in one thread are immediately visible to all the other threads.

So, what's happening in yo code is that there is no shared synchronization barrier between your threads. However, there is a volatile variable; Atomic Integer wraps a volatile int. So, any updates to the counter are guaranteed to be visible immediately across threads. However, I believe ConcurrentLinkedQueue doesn't use volatile (I might be wrong.. however it's unlikely that they could ensure everything is volatile). SO, it may take time for changes in the Queue to propogate through to your main thread.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Manoj Kulkarni wrote:I have a new concern now. Considering the theory of memory model, queue.add should get executed under any circumstances may be after atomicinteger has incremented.
But there is no evidence of queue.add call being made but there is evidence of atomicinteger increment. Is it possible to miss the entire call in above prototype.


No. The call was made, or 'will be made' at some point. I reviewed your prototype to implement a safe version of it, and I think there are possibly multiple things going on as well. I think as you add items to the batch, they are getting complete, so I don't think, at any given time, you can say your batch size is actually 8000. I don't trust your counting mechanism, it is too complex to keep track of.

If your intent is to:

1) Have a pool of 100 threads performing tasks
2) Have a batch of tasks to process.
3) For each batch, process all tasks, and at the end test to see if the problem occurred
4) Clear the queue at the end of the batch
5) Repeat the batch a bunch of times

Then I think the following is a much better strategy:

The QueueTask is the same as your QueueThread, but renamed (because it isn't a Thread) and with the AtomicInteger removed.

Based on your CLinkQueueTest, except it uses a CompletionService to track the running tasks, which lets you get each task returned as it completes, then checks if the problem occurs. In this sample, I write the problem to System.err and immediately exit the app to make it more obvious the problem occurred.

I ran this code next to yours, and this runs a ton faster (~10000 iterations per minute as opposed to about 60 per minute using your code). I know speed isn't the main concern with your test, but the difference harkens to your code being more complex than needed (why is it so slow? Could be the infinite loop you have in the main thread that doesn't do anything except consume CPU cycles from other tasks... not sure). Of course, it could be that my code ends up being a lot of do-nothing, but simply putting a String in a Queue should be something you can do more than 8000 times per second.

Oh, I have run my code to completion about 12 times, and have not seen the problem. I have yet to complete a run of your code, so I don't know if I can reproduce the issue yet.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Steve,Jayesh,Luan thanks a lot for all the help. Though i could not reproduce the problem, the whole exercise was really informative and learnt a lot. We will definitely have to make a change in tracking mechanism as suggested by Steve and Jayesh.
Manoj Kulkarni
Greenhorn

Joined: Apr 30, 2012
Posts: 23
    
    1
Finally we have identified the root cause and it's no bug in java or concurrentlinkedqueue. The problem was that business logic was using a serializable class which did not have serialversionUID defined. Hence in rarest or rare cases it was giving InvalidClassExceptions.
This was causing 1 thread to fail in extreme rare cases and there was a problem in exception handling, because of which this error was absorbed and we never figured out what was happening.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Excellent find. These problems can be a pain to track down. Gratz on finding it - Have a cow!
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Concurrentlinkedqueue fails to add data in multi-threaded environment.