GeeCON Prague 2014*
The moose likes Threads and Synchronization and the fly likes Is this class thread safe? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Is this class thread safe?" Watch "Is this class thread safe?" New topic
Author

Is this class thread safe?

Peter Hsu
Ranch Hand

Joined: Aug 25, 2006
Posts: 72
I think the following class is thread safe, but just in case...
If it isn't why not? (But I really think it is though...)



Thanks!
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

That depends on a couple of things:

1) This line:

Depending on what that code actually does, what it references, what resources it uses, and how safe all that is.

2) The type of <T>
Just because the collection data is immutable, and therefore safe to use in multiple threads, that doesn't mean the data stored in it is. Multiple threads accessing the same data in the collection could modify that data (the instances of type T) causing problems.

Generally, the access of data is safe. The collection can't escape in a mutable form, and the initial assignment from the method-local newData is atomic, so there isn't a chance to get partial assignment or half-filled / inconsistent assignment or data. In reality though, the collection is only as safe as the data in it, so if the type <T> is not safe, neither is this class.

But the least safe part of the code is the part that you left out, as highlighted by number 1) above.


Steve
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 417
    
    2

It's got some subtle nasty's ....

The assignment of data although to an immutable doesn't help the reference for data is not published, getData could in theory return null after updateData (as an example) has been called you need a write memory barrier between the assignment of updateData and a potential read from another thread of getData. updateData should have been synchronized or data volatile etc etc Synchronzing getData and updateData is by far the simplest thing so if your having difficulty just do it, I would always recommend just synchronizing all public methods in such a class or marking the class clearly as unthread safe.

I would also do as little as possible in a constructor, its generally a bad rule of thumb to publish the this pointer from a constructor anywhere another thread can get hold of it as special memory model guarantees (e.g. finals are final) apply only after you exit a ctr. In this case I can't see anything wrong as scheduling should synchronize and give you happens before ordering at that point and you have no code after the schedule and the timer task would only call back after a second anyway lol . If your not confident synchronize your public methods and keep your ctr simple, if you later get performance issues profile and address then.


"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
Peter Hsu
Ranch Hand

Joined: Aug 25, 2006
Posts: 72
Steve Luke wrote:That depends on a couple of things:

1) This line:

Depending on what that code actually does, what it references, what resources it uses, and how safe all that is.

2) The type of <T>
Just because the collection data is immutable, and therefore safe to use in multiple threads, that doesn't mean the data stored in it is. Multiple threads accessing the same data in the collection could modify that data (the instances of type T) causing problems.

Generally, the access of data is safe. The collection can't escape in a mutable form, and the initial assignment from the method-local newData is atomic, so there isn't a chance to get partial assignment or half-filled / inconsistent assignment or data. In reality though, the collection is only as safe as the data in it, so if the type <T> is not safe, neither is this class.

But the least safe part of the code is the part that you left out, as highlighted by number 1) above.


I understand your point. T is supposed to be immutable. I should have made my code clearer. Also... I forgot... updateData is supposed to be PRIVATE as only this class is supposed to periodically update the content of the collection and no one else.... Is it thread safe now?
Peter Hsu
Ranch Hand

Joined: Aug 25, 2006
Posts: 72
Chris Hurst wrote:It's got some subtle nasty's ....

The assignment of data although to an immutable doesn't help the reference for data is not published, getData could in theory return null after updateData (as an example) has been called you need a write memory barrier between the assignment of updateData and a potential read from another thread of getData. updateData should have been synchronized or data volatile etc etc Synchronzing getData and updateData is by far the simplest thing so if your having difficulty just do it, I would always recommend just synchronizing all public methods in such a class or marking the class clearly as unthread safe.

I would also do as little as possible in a constructor, its generally a bad rule of thumb to publish the this pointer from a constructor anywhere another thread can get hold of it as special memory model guarantees (e.g. finals are final) apply only after you exit a ctr. In this case I can't see anything wrong as scheduling should synchronize and give you happens before ordering at that point and you have no code after the schedule and the timer task would only call back after a second anyway lol . If your not confident synchronize your public methods and keep your ctr simple, if you later get performance issues profile and address then.


Yes, I would perform a null check if this is production level code. However, I would not agree would making everything synchronized though. This kind of defeats (at least part of) the purpose of immutable pattern.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Peter Hsu wrote:
Steve Luke wrote:1) This line:

Depending on what that code actually does, what it references, what resources it uses, and how safe all that is.

I understand your point. T is supposed to be immutable. I should have made my code clearer. Also... I forgot... updateData is supposed to be PRIVATE as only this class is supposed to periodically update the content of the collection and no one else.... Is it thread safe now?


That <T> is supposed to be immutable does help, but there is no way to enforce that. So does making updateData() private. The safety of the class is still dependent on the missing code and the resources it uses. If another thread can use those same resources then there is a chance of breaking the updateData() method.

I also am a little worried that the class isn't final. A subclass could easily override the getData() method and replace it with something that isn't thread safe. Making that method final may be enough in this sample but in general you should prevent subclassing when possible to avoid situations where a child class could disrupt the assumed safety when referring to the parent type.
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 417
    
    2

Sorry I'm not being clear , the null check would only cover one case (don't do that).

The answer if you want to avoid synchronization is volatile or AtomicReference my point was if your struggling I wouldn't recommend them keep it simple.



Imagine the reference data starts at null and moves through references to other list X,Y,Z (other immutable lists) with each call to update , it should be obvious data (the reference) IS very mutable as its a reference (its changed value 4 times null->X, X-> Y etc) this is different to the object referenced being immutable which it is in this case. The immutable list does give you something IF the reference is correctly published on each mutation but we're not getting far enough to worry about the object being mutable.

So thread A calls updateData and mutates the reference to X , Thread B subsequently calls getData and reads a mutable reference (non final, non volatile). How does the JVM know to prohibit all reordering , cached reads , writes etc etc between the threads i.e. establish happens before order. i.e. you explicitly want the write of the reference by thread A to have happened before the read by thread B 'happens before ordering'. The volatile keyword says to the JVM this reference may be altered by another thread do what ever you need to if anything (e.g. add memory barrier) to ensure you see the latest copy. References are atomic in themselves (i.e. you never see half a reference even if 64bit) so that's not an issue regardless of volatile. If the volatile where there then we could talk about the merits of the immutable list.


Or have I missed something .... ;-)

Hope that helps

Chris

PS Threads don't care about private what would make a difference is if only one thread accessed updateData and getData.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Chris Hurst wrote:Sorry I'm not being clear , the null check would only cover one case (don't do that).

Actually, I don't think null will ever be visible. The updateData() is called in the object's constructor, so a value would have to be set to data before any thing tries to use the container instance. That leaves only the scheduled task that might have access to the data before the instance is complete, but since the updateData() is called before the timer is created (and so before the timer's thread is created). Since there is a synchronization boundary between the code that creates a thread and the start of the thread, and the updateData() occurs before that boundary the result of updateData() would be visible before the task was run even if there was no delay in executing that task.

The answer if you want to avoid synchronization is volatile or AtomicReference my point was if your struggling I wouldn't recommend them keep it simple....

So thread A calls updateData and mutates the reference to X , Thread B subsequently calls getData and reads a mutable reference (non final, non volatile). How does the JVM ... establish happens before order.


The comment at line 18 of the original code says they don't care about getting stale data. This indicates there is no need for volatile or other synchronization. The value will be set at least once before the DataContainer is finished being constructed, and using old data is OK. If there is a requirement to never use stale data then the variable should be volatile, and that would be enough.

Again: assuming the missing code is safe, which hasn't been established.

PS Threads don't care about private what would make a difference is if only one thread accessed updateData and getData.

It does help in that is makes sure the method can't be overriden and you know you control the environment in which it gets called. By making updateData() private he can demonstrate through this code all the cases in which updateData() would be called [in the ctor and in the timer task].
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 417
    
    2

No. ;-)

I would keep it simple move that task out of the constructor, simple synchronization, profile it later, optimise at the end if necessary.

Assuming we don't do that ...

The null was a red herring in that it was my example of a stale value but I will address some of your points, I guess what we are looking for is the guarantee of the visibility of write by one thread and read by the other (the initial view of the object is null's and zero's guaranteed ie you can never access garbage memory). The write would probably be visible as you say as (I had spotted the expected sync) you'd assume a sync in the implementation (was the last time I looked) which you'ed guess would give you a memory barrier, the issue is strictly the writing and reading thread should sync on the same object i.e. you want the thread calling getData to have sync'ed on the same lock as the writing thread that scheduled the task (this is explicitly stated in the memory model I believe as there are some nasty corner cases here). If you don't obey this very simple rule then you have to start thinking about "lock elision" etc etc and it all gets messy (nasty gotchas). I'm not saying it applies in this case. I was about to move onto the read semantics but that's even worse. Note if we had used synchronize on getData and updateData both thread sync on the same this lock all solved nice and simple. I guess the one I always have in my brain is how would something like a distributed JVM like Terracotta perform which support the java memory model but literally.

OK let's consider stale data i.e. its ok to see a non up to date value in general , we'll if you say you accept stale data that's kind of saying it's not a thread safe class and therefore we don't care, great let's go home ... BUT have they thought this through stale can mean NEVER, EVER seeing an up to date value. This is the classic example is it not where you for instance go into a loop polling via an unsynchronised getter the JVM spots what your doing and reorders your code such that the getter is effectively called only once at the start. I've had to fix some similar code to this it's not fun to spot (I think this example has been written up a lot in various blogs), the volatile keyword would forbid this optimisation. Note this is just one example of the evil that can happen. I would strongly advise someone to not write code like this keep it simple till it doesn't have to be.

Peter Hsu
Ranch Hand

Joined: Aug 25, 2006
Posts: 72
Great conversation, thanks Chris, Steve
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Chris Hurst wrote:... The write would probably be visible as you say as ... you'd assume a sync in the implementation ... which you'ed guess would give you a memory barrier


No, this is not a guess, probability, or implementation dependent. This is a guarantee from the memory model: JLS 17.4.4: Synchronization order

An action that starts a thread synchronizes-with the first action in the thread it starts.


Then in the next section it says anything with a synchronizes-with behavior enforces a happens-before relationship.

17.4.5. Happens-before Order
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
...
If an action x synchronizes-with a following action y, then we also have hb(x, y).


So, we know that updateData() finishing happens in the ctor's calling thread before the timer starts (in program order). Therefore updateData() happens-before timer's thread start. We know that causing timer's thread to start (in ctor's thread) synchronizes-with any action in the timer's thread - and so we know that there is a happens-before relationship between the code that causes the timer's thread to start and any possible action of the timer's thread on the container's data. And since happens-before is transitive, then we know updateData() has a happens-before relationship with any code in the timer's thread.

Chris Hurst wrote:OK let's consider stale data ... we'll if you say you accept stale data that's kind of saying it's not a thread safe class

I disagree, I think safety is about data integrity and not the liveness of data. It seems that a good portion of the java.util.concurrent collections take this approach for iteration - allowing stale data in the iterator. But I do understand the point... more important I think is what you follow up with:

Chris Hurst wrote:BUT have they thought this through stale can mean NEVER, EVER seeing an up to date value. This is the classic example is it not where you for instance go into a loop polling via an unsynchronised getter the JVM spots what your doing and reorders your code such that the getter is effectively called only once at the start.

This is perfectly legal and appropriate code reordering that can really screw things up. Setting the data variable to volatile fixes both data liveness (reads get the latest write always) and the chance that the loop could be optimized.


Peter: volatile is not a cure-all for synchronization. It would only work in this case because you care about access to a single variable, independent of any other data, and that variable is assigned to atomically. Synchronization is another approach that works. It too is not a silver bullet, but it isn't as bad as it can be made out to be. Synchronization adds very little overhead if the lock is not being contested. So this is why Chris suggests adding the synchronization preemptively and only find another way if it proves to be a bottleneck. He isn't wrong. I tend to see if there are ways to avoid it when it isn't necessary because I think its overuse can lead to unnecessary blocking. Simply a different strategy. As you can see from the discussion, there is a lot to this stuff and it is easy to miss things.
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 417
    
    2

Around the theoretical null ...

No, this is not a guess, probability, or implementation dependent. This is a guarantee from the memory model: JLS 17.4.4: Synchronization order


Synchronization order only holds if the threads are guaranteed to synchronize on the same monitor nothing in your link contradicts this and I'm saying that may not be the case.

For the casual reader better explanation of what does synchronization do ...

http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#synchronization

Note the example of the attempted memory barrier as a for instance gotcha of it not being synchronization on the same object.

Your hb logic for me breaks down when more than two threads are involved, consider if we had synchronized getData and updateData you would now have a contract that the reader and writer synchronizes-with the same monitor in this case thread safe as your link explain's. If we don't do that I think that (I believe your point) as long as the thread calling the ctr is the same that calls getData you can't even theoretically get a null as your hb logic gives (still other stale data issues but not that one) but I can't see what in the code supplied supports that a thread couldn't call the constructor and effectively leak the reference (no monitor acquistion etc) to another thread (created prior to the constructor call) that then calls getData i.e. there is no guarantee that the thread that calls the ctr calls getData and we're now looking at how those threads use monitors etc to communicate or not . establishing hb between them of which we have no guarantee. Again I'm not saying this would ever happen just the MM rules don't prohibit it ... If any thread that did not call the ctr (or sync on the timer monitor eg task executor) can see the reference returned by the ctr then I can see the guarantee that it can see all writes by the ctr ie not see the objects initial state 0's and nulls (I think this effect / possibility is best explained by blogs round the double checked lock idiom, failed work arounds)

.. and just to clarify for a casual reader we are not debating the staleness issue (i.e. that is a problem) just the validity of a specific theoretical outcome (ie one specific stale value).



Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2297
    
  49
There's a lot of detail in these posts but with reference to the "theoretical null" bit I think I'm correct in summarizing it as follows (apologies if I'm mis-quoting either of you):

The point Steve is making is that 'data' is guaranteed to be non null because updateData() must have been called and must have completed before the constructor completes and until the constructor completes there is no instance of the class available for any other thread to call getData() on.

The point Chris is making is that because the 'data' variable has not been declared as volatile and no explicit synchronization has been used then there is no happens-before relationship and so the result of the call to getData() by a different thread to the one that constructed the object could be any value that has been assigned to 'data' including the initial value which was null.

My limited understanding of the Java memory model is that, as bizarre as it may seem, the scenario described by Chris is correct. You can see this behaviour primarily because of thread level caching for which there is no guarantees about when the cache is flushed to main memory unless some form of synchronization is used.

@Peter: My advice is use standard synchronization techniques and only try and get away from having to use them if you really have a serious performance issue which you can definitely pin down to the synchronization.
 
GeeCON Prague 2014
 
subject: Is this class thread safe?