I am developing a clustered caching services (I know there are many caching services available on the net, but I want to do it just to know what all it takes to develop such a service from scratch) so that any clustered java applications shuld be able to put and retrieve objects from cache within its own JVM.
Clustered Cache should support following features: (a) Client should be able to place/retrieve any object in to the cache. (b) The application has objects that it wants to store in the cache for different periods ranging from few minutes to hours/days (c) The caching service has a limit on the number of objects cached. The client application should not be affected by this limit imposed on the cache. The cache should support pluggable cleanup mechanisms using algorithms like least-recently used (LRU) or least-frequently used (LFU) etc. (d) The caching service should ensure maximum serviceability by cleaning up expired objects appropriately.
I am very much aware of caching and clustering concepts. Now what I am not very sure about is that, how these cached deployed on the different managed servers will be kept in synch so that users don�t get the stale data??
What I have thought so far is that, I will design this clustered cache as a typical single JVM cache(normal cache which just runs fine on single JVM), and i will have to design another component probably using Java Messaging Services to keep these different cache in to the synch. This component will run asynchronously and shall send a message to a JMS Queue and all the clustered cache will listen to that particular JMS queue. As and when any message arrives at the queue, all the cache shall update their status, this way client shall never get an stale data.
Apart from this I want to go after this problem in a typical OOD way using typical design principles like Open/Close principle (extendibility remain high), Dependency Inversion principle (so that coupling between the various classes/component can be kept to a minimum as far as possible). Apart from this, I also want to use various GoF design patterns too. All in all, it has to be a very flexible design which can be maintained over time without much hassles.
Can anybody please comment on my JMS thingy approach? Does that really sounds like a mature design or is these any better way of doing the same.
When an item is added or changed you might broadcast the whole cached object or only the key. Sending the whole object could be expensive or impossible if it's not serializable. If you send only the key, other caches simply remove the item; the client gets "not found" on the next get and then loads from some persistant store (eg database) and puts it in the cache. We briefly tried this for user profile updates and finally removed the data from the cache entirely.
What if one server in the cluster stops and restarts? I think it would have to ask for a refresh and a re-broadcast of everything in cache.
What if two servers update the same item at the same time? Which one should "win" in updating the others in the cluster?
I'm sure there are more gotchas. Review the doc on existing solutions for some more ideas.
See this blog about space based architecture. The white paper shows how far people are taking this distributed memory idea. It's pretty cool.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
I've been thinking about this and there seem to be several forces competing for a solution.
One force is the usual cache issue of retrieval efficiency - an optimal system will never fetch a remote item if it is already cached.
One force is communication efficency - an optimal system will never communicate unnecessary data.
Another force is the concept of "stale data".
The first two forces seem generally resolvable, in the sense that a compromise solution which reloads slightly more than absolutely necessary, or a system which communicates slightly more than absolutely necessary will not be a "hard" (data) failure only a "soft" (throughput) degradation.
The third force, however, needs much more care, as an incorrect design could lead to "hard" data errors.
When I first scanned through your question, I assumed that this was only a read cache, for essentially static data. If you plan to "write through" a distributed cache, the only aproach I have encountered which attempts to guarantee data integrity is one where a "lock" is broadcast (and acknowledged) for the item before writing, then released when writing is complete. Receipt of a "lock" message forces each system to dump the cached item and not reload it or access it until the lock is released.
If writing is common, this can be a large communication overhead, and any "bottleneck" items in the system which require frequent writes can cause the overall system to actually run slower than a single non-distributed application.
Ignoring the write-through issues for the moment, how vital is it that each collaborating system has exactly the same items cached? A two-level cache-on-demand system can often be pretty effective:
When a syatem wishes to read an item, it first looks in its own cache, if not found it asks its collaborators, if not found there it fetches the real item, which is then automatically avalable to olther collaborators as a second-level cache.
This may make sense if the overhead of fetching the real item is large compared to the overhead of communicating with a small number if closely-coupled systems. In particular, because communication only takes place when a system needs an item, there is no need for continual broadcast of cache updates.
the only aproach I have encountered which attempts to guarantee data integrity is one where a "lock" is broadcast (and acknowledged) for the item before writing, then released when writing is complete.
Cool beans. I'm more glad than ever we didn't try to write one of these.
There was a distributed cache group working on a Java standard. I guess it stalled a few years ago? Haven't heard anything about it for a while. The head cheeze at Tangosol was leading it, I think.
Joined: Aug 20, 2006
Thanks Frank, Your reply was really very informative. I liked the idea of lock broadcasting for updateable distributed cache. I have to think of my design in that prospective as my cache is going to be updateable too.
I understand that the various forces which you mentioned in your post are basically trade offs.
Originally posted by Frank Carver
When I first scanned through your question, I assumed that this was only a read cache, for essentially static data.
Not really.. It should be updateable too. Client should be able to place/retrieve any object in to the cache.
I am still thinking my design, When it comes into a good presentable shape, I would love to share the same with everyone on javaranch.
I sincerely thank Stan and Frank for their valuable inputs
Joined: Jan 07, 1999
It should be updateable too. Client should be able to place/retrieve any object in to the cache.
Maybe I'm reading that wrong, then. To me, that is saying that any client system can load items from the remote system and place them in the cache, as well as retrieving items from the cache. I still consider that a "read cache".
Write-through issues occur when any of the clients can update the contents of a cached object after the cached object is placed in the cache, and/or directly write it back to the remote system. Without this, distributed caching is relatively easy.
As an aside, one of the most popular distributed caching systems in the world, bittorrent, is built on the premise that typical data items are big enough and sources slow/busy/flaky enough that cacheing and sharing partially loaded data is also useful. Is your proposed system likely to need this, or can you guarantee that all cached data items will be very small and relatively quick?