aspose file tools*
The moose likes Threads and Synchronization and the fly likes Concurrency and caching Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Threads and Synchronization
Bookmark "Concurrency and caching" Watch "Concurrency and caching" New topic
Author

Concurrency and caching

Sunny Leung
Greenhorn

Joined: Jun 29, 2010
Posts: 5
Hi all, I'm relatively new to threads and concurrency, so please be gentle

Here is my scenario:

1) I have a servlet based application serving data from a backend, which is slow to respond for reasons outside my control
2) My goal is to cache the results by hitting the server at the application's startup and periodically updating the cache to reduce the wait time for users
3) The results are display in a grid type component, for reasons outside my control, the grid automatically makes a request to the server for an initial view (say, 0-50 rows), then straight after makes another request for the next batch as its own internal cache (rows 51-100) to allow for smoother scrolling

My cache implementation is along the lines of:


Note: when I hit the server, the server is hit for a large data set, and code elsewhere returns the appropriate rows (by list.sublist'ing).

**** The need for this I found was because of the second request from the grid. The first request would cause a cache miss, and the server would be hit. While that is happening, the second request would also cause a cache miss (the first request would not return fast enough) and cause the server to be hit again. Since I hit the server for a large data set, each real miss caused two hits. So I decided to synchronize the cacheMiss method so that while one is in effect, another cacheMiss can't take place.

Have I "correctly" made use of synchronized here? Is there a better/faster implementation I could be doing? Does the use of ConcurrentHashMap allow multiple cacheHits to happen?

Also, whats the best way to renew the cache while minimising the impact on performance? My thoughts were to have a thread in the background that creates separate variables (tempCache and tempLFU), and when the server responds:

Do I need to synchronize this method to ensure its performed atomically?

Sorry for the barrage of questions in this post.

Thanks for reading!
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

I would say that if you need a cache for production deployments, it is worth while to look at products like ehcache that gives you a more professional implementation of a cache.
Cache is one of the topics that can very easily make you believe that "it is simple" but when you get into it, there is a lot of things that you have to see, eg:

1) How do you evict an element from the cache?
2) How does your cache work in a distributed environment?
3) How do you make it live through restarts?

And the list continues.


apigee, a better way to API!
Sunny Leung
Greenhorn

Joined: Jun 29, 2010
Posts: 5
Thanks for the input - I've started looking at some open source caching libraries (JBoss cache, JCS and cache4j) to see if they'll be suitable.

However, for the sake of having a good understanding (rather than delegating the work to a black box), I'm still interested in knowing whether my above attempt is good for (or not good for) what its trying to do.

1) How do you evict an element from the cache?

In my POC, I implemented a simple LRU algorithm to evict elements from the cache. This application will have several grids with different use cases, and the plan is where necessary, a caching mechanism suitable for the use case would be implemented (whether that may be no caching is needed, or it may be the cache is only required to be refreshed from the source once an hour)

2) How does your cache work in a distributed environment?

I cannot answer this with certainty, but lets assume it won't be used in a distributed environment.

3) How do you make it live through restarts?

In our use case, the cache doesn't have to live through restarts. Having the cache repopulate from the source of truth is good enough (which sometimes is a DB, and sometimes isn't)

Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Sunny Leung wrote: I'm still interested in knowing whether my above attempt is good for (or not good for) what its trying to do.


Sure, it is always better to understand things bottom up.

Sunny Leung wrote: Have I "correctly" made use of synchronized here?


yes and no.

Yes because you will achieve your goal of making the other thread wait till you populate the cache.
No because all threads (irrespective of the key they are populating) will wait till any one of the thread has finished its work. This will become a bottleneck in high throughput systems (if yours is one)
If you have a monitor associated with a key, you can lock on that monitor instead of making the entire method synchronized. This will reduce your lock concurrency.

Sunny Leung wrote:Does the use of ConcurrentHashMap allow multiple cacheHits to happen?


Yes, it does. ConcurrentHashMap reads and writes do not lock the entire map but few sections of the map.

Sunny Leung wrote:Also, whats the best way to renew the cache while minimising the impact on performance


You will refresh the entire cache at once? Does all the cache entries expire at the same time?
ConcurrentHashMap will give you good performance for concurrent writes and reads so i think you should be able to update the existing cache entry by entry also concurrent to readers reading from the cache.
Sunny Leung
Greenhorn

Joined: Jun 29, 2010
Posts: 5
Nitesh Kant wrote:If you have a monitor associated with a key, you can lock on that monitor instead of making the entire method synchronized. This will reduce your lock concurrency.

Wouldn't implementing a monitor associated with a key just introduce more problems? (ie, problems with concurrent access to the structure storing the lock status of the keys, a HashMap<MyKey, Boolean> perhaps) Or is there something built-in that I should be looking at?

Nitesh Kant wrote:You will refresh the entire cache at once? Does all the cache entries expire at the same time?

In one case, thats whats happening. Its like presenting a set of log entries (eg, ConcurrentHashMap<Date, String>) on the server - all the clients can only see the cached copy of the log entries no matter how many times they refresh. Its only when the cache refreshes from the source they'll see new data. (This is acceptable in our use case)

Nitesh Kant wrote:ConcurrentHashMap will give you good performance for concurrent writes and reads so i think you should be able to update the existing cache entry by entry also concurrent to readers reading from the cache.

I think I will have to be careful here depending on the use case. In the above log entries example, being log entries, we never actually update the data - only new entries get added. If I were to keep the cache at a fixed size of having 1000 entries, then the act of removing an entry and adding one would have to be atomic to maintain 1000 entries at all times (ie, the method that removes and adds would have to be synchronized).

It just seems to me there'll need to be a synchronized method somewhere down the line that will become a bottleneck?
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Sunny Leung wrote:
Nitesh Kant wrote:If you have a monitor associated with a key, you can lock on that monitor instead of making the entire method synchronized. This will reduce your lock concurrency.

Wouldn't implementing a monitor associated with a key just introduce more problems? (ie, problems with concurrent access to the structure storing the lock status of the keys, a HashMap<MyKey, Boolean> perhaps) Or is there something built-in that I should be looking at?


I was referring to this as a monitor. So instead of synchronizing the entire method like:


you can have a synchronized block (or a reentrant lock if you are using the concurrent package in java):


Sunny Leung wrote:If I were to keep the cache at a fixed size of having 1000 entries, then the act of removing an entry and adding one would have to be atomic to maintain 1000 entries at all times (ie, the method that removes and adds would have to be synchronized).


I do not know your use case in much detail but if i were to make a cache i will not be stringent on having 1000 entries. I will make the maximum entries as "n" and be tolerant to "n +/- m" if that makes my performance better based on lock contention. Ofcourse, this is just my view, YMMV.
Sunny Leung
Greenhorn

Joined: Jun 29, 2010
Posts: 5
Nitesh Kant wrote:So instead of synchronizing the entire method like:


you can have a synchronized block (or a reentrant lock if you are using the concurrent package in java):


ahh I see! That makes perfect sense.

Thanks a lot for your explanations!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Concurrency and caching