This was originally in the Servlets forum, so apologies about the servlet-specific stuff here. But this is basically a synchronization / threading issue, so I've moved it here
So I have this certain database table. It hardly ever changes. To optimize the system a bit, I thought I might 'cache' the table. Just to see. But when the table *does* update, it's important the application get updated as well.
I can tell when that table is updated, because it's only as the result of a form submission, or some other process that I can programatically say "ok, and now you need to update the cache".
I've figured out (in my head) most of the parts.
* Use a ServletContextListener to detect the startup, do a query, and cache that in a Map, placed into 'application' scope. * Use the cache copy by retrieving the Map from 'application' scope. * programmatic updates to the cache can occur when required * manual updates can be accomplished through a 'touch' URL of some sort which will then invoke the cache updating logic.
This is where it gets tricky... When I need to update the cached Map, I want to ensure that no other thread is currently reading it, is that right? I need to synchronize the access to the Map. But this is an infrequent change.. so do I want the synchro overhead at all times?
Is there a pattern used to 'flag' something as 'currently being changed, please wait'? (Is that the pattern I'm looking for?)
Do I even need to worry about synchronization? [ December 04, 2003: Message edited by: Mike Curwen ]
You not only need to worry about synchronization, but also about the Map being accidently modified (or updated by another developer who doesn't understand how it's supposed to be updated). Though you may be right that it rarely gets updated, the very thought of troubleshooting the intermittent and impossible to reproduce bugs that may result should persuade you to deal with the issue up front. You can gaurd against both the concurrent modification and unauthorized access by wrapping your Map in another object that controls access to the Map. If you need to return the whole map to the user, be sure you pass it to the user using Collections.unmodifiableMap() method. You'll also need to use the Collections.synchronizedMap() method. And, after reading the comments in the API about the synchronizedMap() method, you'll probably decide you don't want to pass the Map back to the user. Instead, have them pass in the key to the wrapper and have the wrapper return the value from the Map. And when you force the developers to call your wrapper object everytime they need a value from the map, you never need to worry about notifying them that the map that they've cached has been updated. Lastly, you may find that this wrapper object becomes a convenient place to stuff other application-wide constants (and nearly constants as in the case of your Map). You may want to explicitly dissuade others from doing this (or mabye not). Just keep that in mind as you name and design your wrapper object. [ December 04, 2003: Message edited by: Mark Latham ] [ December 04, 2003: Message edited by: Mark Latham ]
Another strategy to prevent cache updates outside the design ... don't have any update methods! Instead, provide an invalidate method. Invalidate tells the cache to refresh itself from the database either immediately or the next time somebody asks for a value. I just read about this in something Kyle Brown wrote: http://www.javaranch.com/newsletter/200311/Journal200311.jsp#a10 He was talking about distributed caches, which is not a bad thing to think about in case your app ever runs on redundant servers or JVMs.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: May 27, 2003
Stan brings up an excellent point. Whether you have any update methods or not, you must take into account the possibility of your application running on multiple app servers pointing at a single database. If the table is very rarely updated, however, it may be an acceptable business requirement to say the app servers must be cycled before the db update will be propigated to the app servers. Not necessarily what I'd recommend, but it's another option.
One trick I've used in the past where data changes very rarely and is relatively expensive to reload: add a single value table to the database, containing the date/time of the most recent change, and make sure it is updated whenever the data is changed. Then, any cache can check for changes relatively frequently (say once a day, once an hour, even once a minute), but needn't bother with reloading the cache unless the timestamp has changed since it last looked.
The problem is... what if some other thread wants to call setNewCache(new_map);
Is assignment not an atomic operation? (only for primitives?) Is it possible for the cache Map inside of my Cache class to be in some undertermined and 'in-between' state?
Bringing back in the getCache method... if I retrieve the unmodifiable Map, what happens when some other thread, in the meantime, completely swaps out the underlying Map ? 'ConcurrentModificationException?'
fyi: concerns about distribution are 'gold-plating' around here, and even this whole discussion is probably a waste of 'valuable company resources', so I'm not worried at all about multiple servers.
I'm thinking this isn't really a servlet question anymore (if it ever really was...) So I'm moving it to the threads/sync forum.
Joined: Jan 29, 2003
With a fairly small cache, you might be fine with replacing the whole thing. I did exactly that on some configuration from properties files on one system. But if you have zillions of rows and frequent updates you could apply the "invalidate" flag idea to one row at a time. We recently designed (but never built) an administrative tool to update some control data in the application that had a "commit" button. A user could do multiple changes and only refresh the cache once when they were all done. That seemed necessary because the cache was already built to re-read everything from the database, too expensive to do after every little update. Invalidating one row would have been a much better idea.
subject: How do I know when I require synchronized access to cached data?