I have a HashMap that is updated (or some new data added) by a thread after every 30 minutes with some data. There are some web clients that read the data from this HashMap. Do I need to synchronize this HashMap during update process as the clients are only reading the data? If not what could be the result, when the thread is updating an old value in the HashMap and at the same time client is also trying to access the key being updated?
So the scenario is, only the thread is trying to update the HashMap after 30 minutes but all other clients are just reading the HashMap.
What could be the highly optimized solution for this scenario?
SCJP, SCJD, SCWCD1.4, IBM486, IBM484, IBM 483, IBM 287, IBM141, IBM Certified Enterprise Developer - WebSphere Studio, V5.0
Author of IBM 287 Simulator Exam
Originally posted by Vijay Rathore: [...]Do I need to synchronize this HashMap during update process as the clients are only reading the data?
Yes. During an update, the data structures inside the HashMap are updated - and may be updated quite drastically, as in an internal resize - and reading threads may be unable to access the data in the map. This is not necessarily confined to the key being updated. And even if in your particular use case it seems to work, do you want to be that tightly coupled to an implementation detail of HashMap in your particular JDK version?
A more academic point is that, looking at the JVM model, you do in general need synchronization for the memory barriers that guarantee that modifications to the HashMap are propagated to other threads. This is unlikely to be a consideration in practice though, if only because of all the synchronization that is going on in an application server anyway.
So the scenario is, only the thread is trying to update the HashMap after 30 minutes but all other clients are just reading the HashMap. What could be the highly optimized solution for this scenario?
Given that updates are infrequent, and that object reference r/w operations are atomic, one way to realize this scenario would be to never modify the Map (thereby allowing unsynchronized access) and update it by replacing the Map as a whole:This approach also ensures that the update operation is fully atomic, which is an advantage if the various bits of data in the map have some consistency requirement.
Remember to take a copy of the "data" reference every time you need it, so that (1) you minimize the number of volatile variable reads and (2) you ensure that the data you work with remains the same for the duration of your operation. In other words, don't do thisBut do this - Peter
Vijay S. Rathore
Joined: Oct 29, 2001
The solution really makes sense. This was the alternate design that we thought of. Could you please elaborate on volatile variable. What could be the repercussions if we don't use volatile. Any link explaining the volatile w.r.t. Collections will help.
If you don't use volatile, then some threads might still see the old collection after you've updated the variable to refer to the new one. Threads may cache data to improve efficiency; using volatile forces the thread to look up the current value of the variable each time it's accessed, rather than using a cached value.
If some threads see the old collection while others see the new collection, this may not be a big deal. The cached value will probably be updated pretty soon anyway - usually within seconds. However in some cases this may cause confusion. It's realtively easy to prevent this confusion by using volatile.
There's nothing special about the interaction of volatile with collections, really. If you understand how volatile works with any variable, well it works the same way with a reference to a collection. But note that only the reference to the collection is volatile, not everything in the collection. This is not a problem in the scenario described above, because each collection is unchanging after it's been initialized. The only thing that changes is which collection does the variable refer to - which is why the base variable needs to be volatile. None of the internal working of the collection are affected. [ July 21, 2004: Message edited by: Jim Yingst ]
"I'm not back." - Bill Harding, Twister
Peter den Haan
Joined: Apr 20, 2000
Originally posted by Vijay Rathore: [...] Could you please elaborate on volatile variable.
Making the variable volatile is necessary to guarantee that, after you have updated your "data" field, the new value is visible to all threads. Allow me to quote the Java Language Specification:
[The] Java programming language allows threads that access shared variables to keep private working copies of the variables; this allows a more efficient implementation of multiple threads. [...] A field may be declared volatile, in which case a thread must reconcile its working copy of the field with the master copy every time it accesses the variable. Moreover, operations on the master copies of one or more volatile variables on behalf of a thread are performed by the main memory in exactly the order that the thread requested.
Without this, there would not be any guarantee when the change to "data" would be picked up by other threads. Again, with all the synchronization going on inside an application server, the issue is probably just academic, but I would still put "volatile" in because it tells the reader something important about the way you are using this variable. The more expressive your code is, the better.
- Peter [ July 21, 2004: Message edited by: Peter den Haan ]
Vijay S. Rathore
Joined: Oct 29, 2001
Thanks Jim & Peter,
It was a great learning. The scenario made volatile clear to me.