aspose file tools*
The moose likes Threads and Synchronization and the fly likes Concurrent access of a HashMap Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Concurrent access of a HashMap " Watch "Concurrent access of a HashMap " New topic
Author

Concurrent access of a HashMap

V Chauhan
Ranch Hand

Joined: Nov 15, 2002
Posts: 70
Hi,

Outs is a web application deployed on weblogic 7.

We have a HashMap, which stores ~10 entries. This object is accessed concurrently from multiple threads but the access is not synchronized. The threads might get some object or pur a new mapping.

Now we are seeing some of the threads getting struck and the CPU utilization is shooting up. The thread dump of those threads points to the HashMap.

Can this unsynchronized access somehow be the cause? Hope my explanation makes some sesce.

TIA,
Basu.
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Well, I don't see immediately how any method call in HashMap would hang (is that what happens?) due to multiple unsynchronised access. That said, however, if you are doing multiple unsynchronised access, you are violating the correct usage of HashMap and can expect either ConcurrentModificationException, if lucky, or undefined behaviour if not.

You must synchronise to access this map.


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Daniel Mayer
Ranch Hand

Joined: Sep 09, 2004
Posts: 103
Originally posted by Basudev Agrawal:

Can this unsynchronized access somehow be the cause? Hope my explanation makes some sesce.


Short answer: yes.

Due to the memory model of Java, unsynchronized access to memory from different threads leads to undefined behaviour.
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Its not due to the memory model. its fundamental principle of multi threaded programming.
Annekee Dufour
Ranch Hand

Joined: Nov 04, 2003
Posts: 41
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?
Arjun Dhar
Greenhorn

Joined: Sep 23, 2004
Posts: 4
The problem one wishes to avoid is that during "put()", only one thread can put at a time. The "get()" need not be protected., in this situation atleast. I think all the hassle of creating a clone etc. though seems sound in inefficient and unnecessary, so just try to synchronize the put() and see how it goes.

You could also try something else, have a read Only object that consists of the get() method and another write Only object that consists of the put() method. While using the write object, use it in a synchronized block to garuntee thread safety.
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Synchronizing the put() is not enough. If you only do that, a get() that occurs while the put() is in progress may get corrupted data.

Of course, depending on the application, perhaps you don't care if you get corrupted data a few times a day.
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?


Such attempts to avoid synchronisation usually turn out not to be safe. If you don't synchronise or use volatile, the compiler or HotSpot thinks it is at liberty to use registers, re-order operations, delay writes etc. etc. A good example of this is the "double-checked locking" anti-pattern, which looks like clever avoidance of synchronisation, but is actually horribly broken.

Have you actually proven that synchronisation would slow your application down by any significant amount? In recent JVMs, synchronisation really isn't that slow, and anyway, the bottleneck in your application is rarely where you think it is, unless you've actually done experiments and measurements.
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day),

You copy-on-write solution will be fine, however as pointed out it results in minor object churn and performance impact.

Another method is to use Doug Lea's Concurrent Library, specifically a subclass of ReadWriteLock. It allows multiple reader threads and only one writer thread. You can give preference to either type of lock acquisition or first-in-first-out thread order.

The key is that while you are still using synchronization during every read and write call, the blocking is as short as possible -- on the lock object itself. Thus all reading threads will be allowed concurrent access to the map.

Doug's package is very robust, quite complete, tested thoroughly over the past five years, and is a part of Java 1.5 -- or it's going into 1.6, I forget -- as java.util.concurrent.

Here's sample code from the JavaDoc page linked above:I just remembered (and verified) that Doug has already coded up two ConcurrentHashMaps for you to use out of the box. There are two versions (look under the Collections heading on the first page linked above).
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Warren Dew:
Synchronizing the put() is not enough. If you only do that, a get() that occurs while the put() is in progress may get corrupted data.


Even if the get occurs *after* the put it may see corrupted data! Only by synchronizing get() you are guaranteed to correctly see changes made by another thread!


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
David Harkness:

I just remembered (and verified) that Doug has already coded up two ConcurrentHashMaps for you to use out of the box.

Or you could just use Collections.synchronizedMap(new HashMap()) when getting the HashMap.
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Warren Dew:
Or you could just use Collections.synchronizedMap(new HashMap()) when getting the HashMap.

While that would work, Annekee asked specifically about a solution that would work without blocking concurrent readers. Using Collections.concurrent* will allow only one reader or writer at any one time, creating an unnecessary bottleneck when there are far more readers than writers.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

While I think this effort to remove synchronization is worthwhile, there also have to be a balance. The synchronization time for contended locks is improving with each release of Java. It is now to the point where some of these efforts being considered may actually be more time consuming than the synchronization time saved.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Henry Wong:
While I think this effort to remove synchronization is worthwhile, there also have to be a balance. The synchronization time for contended locks is improving with each release of Java.

The point is not to shave milliseconds off a method call by removing synchronization or changing the object on which we synchronize. The goal is to allow concurrent access where possible and serialize only where absolutely necessary.

Any number of threads may be reading and iterating over a map concurrently with no fear of deadlock or interference. Thus synchronizing the entire map will degrade performance -- not because the synchronization keyword adds runtime overhead but rather because you're making threads wait needlessly to perform their operations.

If instead you only serialize writes -- the place where data structure corruption can certainly occur in the case of multiple writers -- you are minimizing contention. If your access pattern has reads far outweighing writes, this could be a huge win.

This is a natural extension of the "synchronize the smallest amount of code necessary" maxim.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

There are a couple of interesting issues here...

First, this is about synchronization of a container. It is about as small as you can get -- just synchronize long enough to store or retrieve an item... but the second issue is more interesting...

The amount of time to synchronize is actually not consistent. It depends on whether the lock is contended. If you try to grab a lock that is already owned, it may take milliseconds to acquire the lock after the lock is freed. If it is unowned, it will take nanoseconds... this means that you pay a very small (possibility insignificant) price if no conflict occurs. You do lose milliseconds if the lock is contended, but you have to anyway to be thread safe.

Not really providing a clear cut answer of what to do, but hope this helps,
Henry
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

One more issue... Using a reader writer lock on any class without knowing the implementation is not a good idea. At first glance, two threads should be able to read from a container simutaneously, but it is implementation specific. You don't know if the container has to calculate an index or other value internally. This means that two simultaneous reads from a container may not be thread safe.

Hope this helps,
Henry
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?


No this is quite unsafe. Your methodology is correct though. You just need to ensure that 2 threads do not try and to a put at the same time. Which would result in a lost put. Thus, you will have to syncrhonize

I always try my best to avoid synchronization. Its a good idea to do so, but if you ever accomplish it you better institute a once a week sanity check to be sure you didn't blow it. Which you probably did
Annekee Dufour
Ranch Hand

Joined: Nov 04, 2003
Posts: 41
Thanks everybody,
I guess I will be using the ReadWriteLock, or add some synchronization around the newmap. One of the reasons I did not want to use synchronization around the original map is that it will be accessed by other people's code, and they might iterate over it without synchronizing. Since there won't be that many puts, this will not be a problem most of the time, but sometimes it will be a problem.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by David Harkness:
If instead you only serialize writes -- the place where data structure corruption can certainly occur in the case of multiple writers -- you are minimizing contention.


You are right that you only need to *serialize* writes, but there is more to synchronization than that:

- You also need to make sure that noone reads while a (non-atomic) write is in progress, and
- synchronization in java is not only about serialization of access to data, but also about read/write barriers. That is, after a write, all reading threads need to synchronize, else they might see inconsistent data because of not fully uptodate local caches.
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Ilja Preuss:
You also need to make sure that noone reads while a (non-atomic) write is in progress

Using a ReadWriteLock does exactly that. It allows any number of readers or a single writer to acquire the lock.
after a write, all reading threads need to synchronize, else they might see inconsistent data because of not fully uptodate local caches.

I remember reading an article by Doug Lea discussing various ways around this: using transient, synchronizing twice, etc. IIRC, in the end his conclusion (this was from 1.1/1.2 days) was that there is no 100% solution.

Do you have an recent references on precisely how this should be done?
[ September 29, 2004: Message edited by: David Harkness ]
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by David Harkness:
I remember reading an article by Doug Lea discussing various ways around this: using transient


I guess you mean volatile?

synchronizing twice


How would that work?

Do you have an recent references on precisely how this should be done?


No, sorry.
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Ilja Preuss:
I guess you mean volatile?

LOL, yes sorry.
Me: synchronizing twice
IP: How would that work?

I wish I could find the original article I read. I suspect I'm remembering incorrectly. What I do remember is that it walked through several (four or five) iterative designs to improve the following trick.After doing some testing, he found that this does not ensure that Foo will be instantiated only once. Making foo volatile didn't ensure it either. There were a few more tricks he used -- and "synchonizing twice" probably wasn't one of them -- but IIRC ended at the same place: you cannot ensure it given the Java memory model at the time.

Now, does this apply to 1.4 or 1.5? I don't know. In any case, here is one article about the memory model that I did find and Bill Pugh's Java Memory Model page has other references.
[ September 29, 2004: Message edited by: David Harkness ]
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Yay! I found it on the Memory Model page linked above: The "Double-Checked Locking is Broken" Declaration. And it does discuss a solution using nested synchronization -- it doesn't work, but it discusses it so my memory isn't totally hosed.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Ah, the infamous DCL - I thought you might mean something different...

Thanks anyway...
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

Just did a quick lookup of the java.util.Hashmap class. Apparently it is thread safe as long as the container does not structurally change. This means that using a reader writer lock lock should work. In fact, if you are changing an element with another element, you should also just grab the read lock; even though you are changing values, you are not changing the structure of the container, so it's thread safe.

I'm surprise with this myself... learn something new everyday...

Henry
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Henry Wong:
In fact, if you are changing an element with another element, you should also just grab the read lock; even though you are changing values, you are not changing the structure of the container, so it's thread safe.


Unless the map has to resize, I assume?
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
Originally posted by Ilja Preuss:
Unless the map has to resize, I assume?

Why would the Map be resized when replacing the value for a pre-existing key?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by David Harkness:

Why would the Map be resized when replacing the value for a pre-existing key?


Of course it won't - silly me... :roll:
[ October 01, 2004: Message edited by: Ilja Preuss ]
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

This may be interesting... JDK 1.5 (a.k.a. 5.0) added a concurrent hashmap. It is threadsafe, so no external synchronization is necessary. It is highly optimized, allowing parallel reads to occur. And if I remember correctly, it is also segmented. Depending on how the keys hash, it may also allow parallel writes to occur.

So, if a fully optimize hashmap is needed, consider upgrading to 1.5.

Hope this helps,
Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Concurrent access of a HashMap