This week's book giveaway is in the Cloud/Virtualizaton forum.
We're giving away four copies of Mesos in Action and have Roger Ignazio on-line!
See this thread for details.
Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Distributed Caching

 
Anand Sid
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi -
I am in the initial analysis phase of a project. In which we are going with ORM. We have shortlisted Toplink as the provider.

"I would like to know what strategy is best suited for handling caching for descriptors that are upadted quite often in a clustered enviroment. I.e., A change is made from server A and another server B which may be reading the same date from it cache which has stale data."

I used to work with ATG Repository in which we used Distributed JMS (SQLJMS). WHen we update the descriptor and commit it to the db. We send a JMS message to a JMS Topic which is picked up by other servers in cluster and accordingly the changed record is cache invalidated. Should I be using a similar approach here?
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Personally, the more transactional and changing the data is, the less I would cache it at all. Because pretty soon, all your resources are going to updating all the servers with the latest data, and no processing power to support your clients, so the drag and slowness becomes more than going to the database to get your data.

Data that remains the same is the highest roi on loading data into a distributed cache.

Now there are a good number of distributable caches out there and I believe all pretty much should be able to be used with Toplink.

"We have shortlisted Toplink as the provider."

that looks more like you choose Toplink rather than shortlisted.

Mark
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
TopLink (and EclipseLink) provide several features for handling stale cached data. Which feature you choose depends on have often your data is updated and what your applications level of tolerance for stale data is.

Regardless of which solution you choose I would recommend you use an Optimistic Locking Policy to ensure that no write will occur on stale data. Even if you are not caching at all this is still important.

If you data is frequently updated, and you want to avoid any stale data, then consider turning off the cache, by using an Isolated cache in TopLink for that class. Note that in TopLink you can configure the cache type for each class, so read-only or less frequently read classes could still use caching.

If you are less stringent on avoiding stale data, you could set a Cache Invalidation Policy in TopLink to timeout stale data after a threshold of milliseconds. Again in TopLink this can be configured for each class, allowing different classes to have different thresholds.

Another option is to use refresh on your queries where you require update to date data. This can be used in combination with the TopLink descriptor setting onlyRefreshCacheIfNewerVersion() and optimistic locking to only be refreshing when the object has been updated.

If the data is less frequently updated you can use TopLink Cache Coordination in a cluster (distributed caching). TopLink supports Cache Coordination over several protocols including JMS, RMI, RMI-IIOP and CORBA. I would recommend using JMS. Be careful using this is the data is frequently updated as it may degrade performance, but this depends on how fast the connection is between the machines in the cluster versus the database. Again in TopLink this can be configured for each class, allowing some classes to use a coordinated cache, and others to use other mechanisms to handle stale data.
 
Anand Sid
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mark Spritzler:
Personally, the more transactional and changing the data is, the less I would cache it at all. Because pretty soon, all your resources are going to updating all the servers with the latest data, and no processing power to support your clients, so the drag and slowness becomes more than going to the database to get your data.

Data that remains the same is the highest roi on loading data into a distributed cache.


Thanks Mark. That�s a very good point. I think I didn�t ask the right question - For certain data which are not updated frequently but there is a chance of it being updated. Those are the possibility that I have to look at.
Anyways I don�t have any concrete scenario at this point of time. I will post if i get one in the project. Thank again.
 
Anand Sid
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by James Sutherland:
TopLink (and EclipseLink) provide several features for handling stale cached data. Which feature you choose depends on have often your data is updated and what your applications level of tolerance for stale data is.


Thanks James. That is a very good list you have put up.
At the end of the day as Mark says caching settings have to set based on the way the day is accessed or updated.

Let me put together one simple example (in a clustered enivironement) �
1. Let say an employee in a company creates an Order for say stationeries and sends the same for approval to his manager.
2. The manager either can approve the order for further processing or reject back with or without modifying the order.
3. In most cases if the manager rejects he would not change anything in the Order, But say in one scenario he changes the some value in the order and rejects it back to the employee.
4. In this case there is a possibility of the employee seeing the stale data which doesn�t have the details updated by the manager in a clustered environment.

Based on the ways to avoid stale data in Toplink, In this case I can either go for onlyRefreshCacheIfNewerVersion or use Cache Coordination for the Order descriptor.
(For the onlyRefreshCacheIfNewerVersion I presume that a database call will always be sent to check for the version column whenever I try to query.)
Can you tell me which will be appropriate for the above scenario?
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, still not knowing much of Toplink's api, but You might be able to write code in the manager approval that evicts the data from the distributed cache.

I can give the Hibernate method call, then you would have to find the corresponding methods in Toplink.

so in Hibernate calling SessionFactory.evict(Object o) will remove data from the "distributed second level cache".

Are you using any Business Processing Modeling api like Jess or JBoss Rules?

Mark
 
syed aliarshad
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I guess I have the same situation here..

I am using Toplink as ORM. I have implemented caching too using cache coordination.

Below are the lines that I have added in Toplink-session.xml file for caching.

------------------------------------------------------------------------------------------------------------------------

<remote-command>
<commands>
<cache-sync>true</cache-sync>
</commands>
<transport xsi:type="jms-topic-transport">
<topic-host-url>tcp://jms.someurl.com:61616</topic-host-url>
<topic-connection-factory-name>java:comp/env/jms/CacheTopicConnectionFactory</topic-connection-factory-name>
<topic-name>java:comp/env/jms/topicname</topic-name>
<jndi-naming-service>
<initial-context-factory-name>org.apache.activemq.jndi.ActiveMQInitialContextFactory</initial-context-factory-name>
</jndi-naming-service>
</transport>
</remote-command>
------------------------------------------------------------------------------------------------------------------------


The above settings are working fine for one cache node. I want to make the JMS Cluster having more than one cache node. How can I achieve this?


Thanks
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What server are you using? You just need to launch another server with the same settings.
 
syed aliarshad
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using tomcat.

I have 2 tomcat nodes and 1 JMS cache node. I want to add one more JMS Cache node for JMS Clustering.

If we add one more url with comma separated in the below given line then will it work?

<topic-host-url>tcp://someurl.com:61616</topic-host-url>

Thanks
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, topic-host-url is the URL of the node hosting the JMS topic, this must be the same host for all servers in the cluster.

You should not have to do anything different on either node. They should both connect to the same topic on the same JMS host and be in communication with each other.

What JMS implementation are you using?

 
syed aliarshad
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We have 2 tomcat nodes but they are sharing the same source code so my Toplink-sessions.xml file would be one and we are adding one more cache node that will have jms broker setup.

Structure:
======
Tomcat Nodes: 2
JMS Cache Nodes 1. Need to add one more to make a cluster.

thanks in advance
 
James Sutherland
Ranch Hand
Posts: 553
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I thought you wanted to cluster the Tomcat instances, not the JMS server.

What JMS implementation are you using? Clustering a JMS server will be dependent on your JMS implementation and if it supports this, it should not affect JPA configuration.
 
syed aliarshad
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry for the late reply.

We are using JMS 1.1 implementation. We need to provide the Failover settings like failovertcp://primary:61616,tcp://secondary:61616)?randomize=false.

Our toplink-sessions.xml file looks like this.

<transport xsi:type="jms-topic-transport">
<topic-host-url>tcp://URL:61616</topic-host-url>
<topic-connection-factory-name>java:comp/env/jms/CacheTopicConnectionFactory</topic-connection-factory-name>
<topic-name>java:comp/env/jms/toplinktopic</topic-name>
<jndi-naming-service>
<initial-context-factory-name>org.apache.activemq.jndi.ActiveMQInitialContextFactory</initial-context-factory-name>
</jndi-naming-service>
</transport>


And we need to implement Shared Nothing master/Slave strategy for HA. Could you please help on this.

Thanks
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic