File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes EJB and other Java EE Technologies and the fly likes Data Retrival in Clustered environment Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » EJB and other Java EE Technologies
Bookmark "Data Retrival in Clustered environment" Watch "Data Retrival in Clustered environment" New topic
Author

Data Retrival in Clustered environment

Hitesh Gupta
Greenhorn

Joined: Jul 03, 2008
Posts: 15
Hi

We have a Java process that pings the DB (mainframe DB2) at a predefined frequency. If Java process finds some data in DB it reads and processes that data and inserts the data in IBM MQ using JMS.

We are now planning to migrate it to WAS with support for Load Balancing and Clustering environments. We have decided to use Spring Batch Framework.

Problem:-

Now in a multi clustered environment, suppose if some data is there in DB2 and is updated in MQ by java program running in one cluster. How can we ensure that once data is retrieved from one cluster, it should not be repeated by other java program running in differnet cluster. Otheriwse it will lead to duplicate data in MQ by differnet clusters.

May be if we put some flag checks in java to comply to it. But in a multi clustered environment, how we can achieve this.

Regards,
Hitesh.
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638


May be if we put some flag checks in java to comply to it. But in a multi clustered environment, how we can achieve this.

Put flags in DB and not in java.

An alternative is to put processed and unprocessed data in different tables.

Lets assume, you have a table T1 where new data is kept.
One process picks up the data ("select for update" insures that nobody else processes the data at the same time) from T1. On successful processing, it deletes the data from this table and adds it to the another table T2.
Since all your processes queries T1 and not T2, no data will be processed twice.
Make sure that you take a lock on the data using "Select for Update" so that no two processes process the same record concurrently.
[ July 03, 2008: Message edited by: Nitesh Kant ]

apigee, a better way to API!
Hitesh Gupta
Greenhorn

Joined: Jul 03, 2008
Posts: 15
Nitesh,

Thanks for your reply. The suggestion is remarkable but here all data is being provided by third party vendor and I can't change the design of data retrival in to new table.
So, I was looking if we make it happen through Java only.

Regards,
Hitesh.
Kyle Brown
author
Ranch Hand

Joined: Aug 10, 2001
Posts: 3892
    
    5
Would it be possible to add a trigger to the DB2 table that adds the data to MQ directly rather than polling for it in Java?


Kyle Brown, Author of Persistence in the Enterprise and Enterprise Java Programming with IBM Websphere, 2nd Edition
See my homepage at http://www.kyle-brown.com/ for other WebSphere information.
Hitesh Gupta
Greenhorn

Joined: Jul 03, 2008
Posts: 15
Kyle,

How will I interact with MQ form DB2.
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
How can we ensure that once data is retrieved from one cluster, it should not be repeated by other java program running in differnet cluster. Otheriwse it will lead to duplicate data in MQ by differnet clusters.


You can develop a monitor that watches the activity of the cluster members and the queue via correlation ID. Once a set of data has been retrieved, the monitor ensures that no other cluster takes the data.
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

hitesh:
So, I was looking if we make it happen through Java only.


I think it will be difficult for you to do in a java app sitting on one of the nodes.
Moreover, the information that you have processed a record must be persisted so that even if the node that processed the data crashed, other nodes still know that it has been processed.

In my opinion you have 2 options:

1) Run a single instance of your java application i.e. as a separate process .
2) Write a process that copies data from the main table to a secondary database that you have a control on. The processing threads uses the table from this secondary database.
 
Don't get me started about those stupid light bulbs.
 
subject: Data Retrival in Clustered environment