aspose file tools*
The moose likes Threads and Synchronization and the fly likes HashMap Problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "HashMap Problem" Watch "HashMap Problem" New topic
Author

HashMap Problem

jon london
Greenhorn

Joined: Aug 26, 2010
Posts: 8
Hi

My multi-threaded web application has a HashMap, which can contain up to 60,000 records loaded from a DB table.

The loading / refreshing of the HashMap is triggered by only by the following:

-Application Startup
-Every morning (during low web traffic) we manually call a refresh function which will clear the HashMap and reload it from the DB table.

Outside of these events, the HashMap is used for retrieval and never updated. Each retrieval operation must have access to the full contents of the HashMap (i.e there should be no retrievals attempted whilst the refresh command has is repopulating the HashMap - as this will cause incorrect operations in some cases).

The Class that holds the HashMap is a singleton.

I want to avoid having to synchronize the retrieval operations since this would carry a performance overhead throughout the day. Is there a way I can synchronize the HashMap, only during the update / refresh operation? Whilst the update function is executing I'd want to lock all access to the Map until the update is complete ensuring that all retrieval operations are performed against the HashMap containing all data.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

jon london wrote:I want to avoid having to synchronize the retrieval operations since this would carry a performance overhead throughout the day.


Don't assume that. "Synchronization is slow," is one of those 'gems' left over from the old days of Java. Don't go looking for complicated premature optimizations to solve a problem you don't even know you have yet. Start by writing dumb code. Then measure it. If it's too slow, and once you find out what in particular is causing the bottleneck and by how much it's too slow, then you can figure out what solutions might work for you.

jon london
Greenhorn

Joined: Aug 26, 2010
Posts: 8
Thanks for the reply, much appreciated.

I was concerned that my solution was not the most elegant since it is obtaining a lock on the Singleton for each thread that performs a retrieval from the HashMap (Especially since the refreshing of the HashMap will generally occur for 20 seconds every day so the locking on all retrievals seemed like overkill).

I'll take your advice on board, performance test the current code and look at optimizations if necessary. Seems I'm being overly-paranoid about the cost of synchronization!
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

If you do find you're having problems--and they may be more from the 20 sec. lock while the new map is created--than individual quick sync-and-release throughout the day, a couple things come to mind that you might want to check out.

1) The various tools in java.util.concurrent. In particular, java.util.concurrent.ConcurrentHashMap (or something like that).

2) Consider creating the new map in its own thread, then when it's ready, switch out a member variable from pointing to the old map to pointing to the new one. That would have to be in a sync block or the map variable would have to be volatile.

Not sure whether either of these will be valid for you, but keep them in mind in case you have to go down that path.
jon london
Greenhorn

Joined: Aug 26, 2010
Posts: 8
Great suggestions, thanks!

I need to test how long it takes to perform the refresh with the max number of records in the DB... If this is quick (a couple of seconds or less) then i think i'll be ok with the code as-is... If it's not quick, i'll look into the above suggestions.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Remember to think about data consistency across multiple gets. I don't know how your class is constructed, but if it is something like this:

- A singleton
- Methods in the singleton each access the map and send out results
- Data can be modified, replacing the map.

- Users call individual methods on the singleton
- Data needs to be consistent

Then it is not enough to make the dataHolder variable volatile or make the methods synchronized because the map could be changed in between the two getXxx() methods. The entire user code section would also have to be wrapped in a synchronized block. This is not the case if the user code does not need to make multiple consecutive calls to the class or if the results of these consecutive calls don't need to be consistent with each other.

But if the above scenario is your situation then I would hazaard that you should not be using a Singleton, but instead use Factory method which uses a cached instance, but whose instance can be swapped out when required (when the data is updated).


The use case would not change, but they could access the methods progressively without having to add their own synchronization since the ADataSource could not be updated out from under them. They would get their instance, and if some other thread updates the data between get()s it has no effect because it would trigger a new instance of ADataSource, not the one the user is working with.


Steve
jon london
Greenhorn

Joined: Aug 26, 2010
Posts: 8
Hi Steve

Thanks very much for your response.

Each thread would need to perform only one get() on the HashMap. These threads are generated from web requests and would never be responsible for updating data in the HashMap.

The HashMap would only be updated on server startup (so there will be no .get() requests to the HashMap at this stage) and when a manual request (early each morning) is triggered to refresh the HashMap at runtime - It's during this refresh operation where I need to ensure that the web request threads are reading from a full set of data ... They need to wait until the HashMap has been refreshed.

I like the idea of swapping out the map once the data is refreshed.. The only concern here is that there would be 2 large objects on the heap during the swap... Albeit for a short period of time.

Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

jon london wrote:
I like the idea of swapping out the map once the data is refreshed.. The only concern here is that there would be 2 large objects on the heap during the swap... Albeit for a short period of time.


Yes, that's one of the trade-offs. However, just like with the speed, don't go borrowing trouble by assuming it's going to be too big. You said the map can hold up to 60,000 objects, and presumably you know what those objects will look like, so it's trivial to construct a 10-line program that creates 2 maps, each holding 100,000 of those objects (to provide a margin of error) and see how much heap it eats up.

Or we could do a back of the envelope calculation.

Assume a plain old Object is 20 bytes. Then an <Object, Object> Entry in a Map is 48 bytes--two 4-byte references + two 20-byte objects. So 200,000 entries is going to be at least 9,600,000 bytes. Call it 10 M. Double it to 20M for keys + values. That should be no problem, but it's worth calculating a lower bound to see if even the best case is going to be untenable.

Now let's take a larger object. Say you use the same class for key and value, and your class's member variables are 10 Strings of 100 chars each. Now, in addition to the 48 bytes, we'll add 40 bytes (10 references to String objects) + 200 bytes (10 String objects at 20 bytes each) + 200 bytes (10 char[] objects at 20 bytes each) + 2,000 bytes (10 char arrays of size 100, each char is 2 bytes). So now each entry is our original 20 M + (40 + 200 + 2,000) * 200,000 * 2 ~= 916 M, so call it 1G. Still not a problem for most server class hosts, but worth testing in your environment. (And for this kind of calculation, we could have saved our self some pointless nitpicking by noticing that the char[]s would dominate, so we could have just said 2,000 bytes for the chars in one object * 2 objects per Entry * 200,000 entries = 800,000,000 bytes, and then add say 10-20% for the small stuff we left out.)

Obviously these numbers are rough guesses, and the real values will depend on JVM implementation, your specific data, a bunch of smaller stuff I left out, and whatever math errors I made along the way. But they give us a reasonable idea of whether the scale is "no problem," "should be okay," "probably a tight fit," or "out of the question." Without knowing more details, I'd say this looks like, "should be okay."
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HashMap Problem