Here is a interesting question that i need some different views on.
We have designed a solution (quite common) that uses a timer bean that triggers every 2 min to read data from a database and process it.
The issue is that with the amount of processing we are doing we get a through put of 2/second this includes conversions, extra database writes etc. O CPU utilization in about 2-5% (Huge....huge cluster)
Normally this would not be a issue, but seeing that we receive 5000 records at a 10 in the morning and that all of them are equally important to process right away....we are a bit stuck. (total time +- 45 min)
So....here with my new solution which i need help on:
1. The database we are reading off gets an extra column that sequences the data based on the amount of timers registered.
2. Based on a mbean`s config i start multiple timers.
3. These timers now only read from the database there respective sequence numbers data (timer 3 only read data where the TIMER_SEQUENCE column is 3). and places it on a jms queue with the messageSelector property set to the sequence of the timer.
This basically allows us to process in parallel
4. Multiple MDB`s get registered to listen for there respective messageSelector value and processing happens as normal.
If i`m correct, the only thing i need to configure statically is the MDB`s to listen for there respective messageSelector value. Say from 1 - 10.
If we get multiple messages on the queue (which we are going to get as the multiple timers read and throw jms messages) the MDB`s will scale and create more instances that listen on the same queue and for the same messageSelector property.
Any one with pitfalls and flaws that i have seen....this is quite new to me processing this amount of data as fast as possible.
Derick, before implementing your solution which sounds like a pain to debug and maintain, what about this:
Carefully profile your code, finding the performance bottlenecks. Fix them where possible. Measuring with
should be enough.
You'd have to be doing a heck of a lot of processing of each record to get a maximum throughput of only 2 per second.
Also, are you batching your database writes? Do you have appropriate indexes for your most common database reads? That's often the cause of really big performance problems, but thankfully the solution is quick and easy.
There is for sure come optimizations that we can do, but i cant see is getting past 10 records a second. It has more to do with the actual processing and amount of processing we are performing. The payload we are using is also quite substantial and unmarshaling this using JAXB plus changes, comparisons, additional lookups and persisting this schema into multiple tables is quite intensive.
I`m really looking here for a mass real-time processing pattern type implementation.
We are using this application in a bank and the size of the data plus volume is quite intense.
We will optimize the application but i still need a way to process this as fast as possible.
I may not be completely understanding your requirements, but it seems you have a burst of activity (lots of DB records) and you need to process them in parallel, but you have only a single timer bean running.
What if the timer bean only scans the incoming records, and for each one posts a message which contains the record ID (primary key or whatever is necessary to identify a particular record). An MDB receives the messages and processes them one at a time by reading the incoming DB record and doing whatever (2 second) processing is needed.
Now, configure your server to support whatever degree of parallelism you want for the MDB. For example, in WebSphere you configure the "Maximum concurrent endpoints" to the number of (concurrent) instances of the MDB the server will create to handle messages from that particular JMS topic. If you set this value to 10, then the server will create up to 10 instances of the MDB, each running in parallel pulling messages off the same queue.
I agree with Mark there is no need for message selectors to scale MDB's it can be achieved by increasing the number of (concurrent) endpoints in Websphere I am sure there will be similar setting in other App servers. Also increasing the batch size to more than one should improve the MDB performance there by allowing the MDB to pick messages in a batch from the Queue. Implementing message selector is a overhead not a performance improvement.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com