File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes EJB and other Java EE Technologies and the fly likes High volume data processing question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » EJB and other Java EE Technologies
Bookmark "High volume data processing question" Watch "High volume data processing question" New topic

High volume data processing question

Derick Potgieter
Ranch Hand

Joined: Feb 19, 2004
Posts: 53
Hi All,

Here is a interesting question that i need some different views on.

We have designed a solution (quite common) that uses a timer bean that triggers every 2 min to read data from a database and process it.

The issue is that with the amount of processing we are doing we get a through put of 2/second this includes conversions, extra database writes etc. O CPU utilization in about 2-5% (Huge....huge cluster)

Normally this would not be a issue, but seeing that we receive 5000 records at a 10 in the morning and that all of them are equally important to process right away....we are a bit stuck. (total time +- 45 min) with my new solution which i need help on:

1. The database we are reading off gets an extra column that sequences the data based on the amount of timers registered.

2. Based on a mbean`s config i start multiple timers.

3. These timers now only read from the database there respective sequence numbers data (timer 3 only read data where the TIMER_SEQUENCE column is 3). and places it on a jms queue with the messageSelector property set to the sequence of the timer.

This basically allows us to process in parallel

4. Multiple MDB`s get registered to listen for there respective messageSelector value and processing happens as normal.

If i`m correct, the only thing i need to configure statically is the MDB`s to listen for there respective messageSelector value. Say from 1 - 10.

If we get multiple messages on the queue (which we are going to get as the multiple timers read and throw jms messages) the MDB`s will scale and create more instances that listen on the same queue and for the same messageSelector property.

Any one with pitfalls and flaws that i have seen....this is quite new to me processing this amount of data as fast as possible.

Any other ideas also welcome.


Steve McLeod

Joined: May 26, 2008
Posts: 11
Derick, before implementing your solution which sounds like a pain to debug and maintain, what about this:

Carefully profile your code, finding the performance bottlenecks. Fix them where possible. Measuring with


should be enough.

You'd have to be doing a heck of a lot of processing of each record to get a maximum throughput of only 2 per second.

Also, are you batching your database writes? Do you have appropriate indexes for your most common database reads? That's often the cause of really big performance problems, but thankfully the solution is quick and easy.

<a href="" target="_blank" rel="nofollow"></a>
Derick Potgieter
Ranch Hand

Joined: Feb 19, 2004
Posts: 53
Thanks for the response!

There is for sure come optimizations that we can do, but i cant see is getting past 10 records a second. It has more to do with the actual processing and amount of processing we are performing. The payload we are using is also quite substantial and unmarshaling this using JAXB plus changes, comparisons, additional lookups and persisting this schema into multiple tables is quite intensive.

I`m really looking here for a mass real-time processing pattern type implementation.

We are using this application in a bank and the size of the data plus volume is quite intense.

We will optimize the application but i still need a way to process this as fast as possible.
Roger Chung-Wee
Ranch Hand

Joined: Sep 29, 2002
Posts: 1683
What is the typical size of the objects you will pass into each MDB's onMessage() method?

SCJP 1.4, SCWCD 1.3, SCBCD 1.3
Derick Potgieter
Ranch Hand

Joined: Feb 19, 2004
Posts: 53
About 0.5kb-5kb, with small bursts over 5kb up to a max of 20kb.

Server specs is a quad core xeon, 16gb ram, san storage as primary volume.

[ May 27, 2008: Message edited by: Derick Potgieter ]
Mark McMillan

Joined: May 16, 2008
Posts: 5
I may not be completely understanding your requirements, but it seems you have a burst of activity (lots of DB records) and you need to process them in parallel, but you have only a single timer bean running.

What if the timer bean only scans the incoming records, and for each one posts a message which contains the record ID (primary key or whatever is necessary to identify a particular record). An MDB receives the messages and processes them one at a time by reading the incoming DB record and doing whatever (2 second) processing is needed.

Now, configure your server to support whatever degree of parallelism you want for the MDB. For example, in WebSphere you configure the "Maximum concurrent endpoints" to the number of (concurrent) instances of the MDB the server will create to handle messages from that particular JMS topic. If you set this value to 10, then the server will create up to 10 instances of the MDB, each running in parallel pulling messages off the same queue.
Babu Rengarajan

Joined: Dec 26, 2006
Posts: 2
I agree with Mark there is no need for message selectors to scale MDB's it can be achieved by increasing the number of (concurrent) endpoints in Websphere I am sure there will be similar setting in other App servers.
Also increasing the batch size to more than one should improve the MDB performance there by allowing the MDB to pick messages in a batch from the Queue. Implementing message selector is a overhead not a performance improvement.
I agree. Here's the link:
subject: High volume data processing question
It's not a secret anymore!