• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

High volume data processing question

 
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

Here is a interesting question that i need some different views on.

We have designed a solution (quite common) that uses a timer bean that triggers every 2 min to read data from a database and process it.

The issue is that with the amount of processing we are doing we get a through put of 2/second this includes conversions, extra database writes etc. O CPU utilization in about 2-5% (Huge....huge cluster)

Normally this would not be a issue, but seeing that we receive 5000 records at a 10 in the morning and that all of them are equally important to process right away....we are a bit stuck. (total time +- 45 min)

So....here with my new solution which i need help on:

1. The database we are reading off gets an extra column that sequences the data based on the amount of timers registered.

2. Based on a mbean`s config i start multiple timers.

3. These timers now only read from the database there respective sequence numbers data (timer 3 only read data where the TIMER_SEQUENCE column is 3). and places it on a jms queue with the messageSelector property set to the sequence of the timer.

This basically allows us to process in parallel

4. Multiple MDB`s get registered to listen for there respective messageSelector value and processing happens as normal.

If i`m correct, the only thing i need to configure statically is the MDB`s to listen for there respective messageSelector value. Say from 1 - 10.

If we get multiple messages on the queue (which we are going to get as the multiple timers read and throw jms messages) the MDB`s will scale and create more instances that listen on the same queue and for the same messageSelector property.

Any one with pitfalls and flaws that i have seen....this is quite new to me processing this amount of data as fast as possible.

Any other ideas also welcome.

Thanks
Derick
 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Derick, before implementing your solution which sounds like a pain to debug and maintain, what about this:

Carefully profile your code, finding the performance bottlenecks. Fix them where possible. Measuring with

System.out.println(System.currentTimeMillis());

should be enough.

You'd have to be doing a heck of a lot of processing of each record to get a maximum throughput of only 2 per second.

Also, are you batching your database writes? Do you have appropriate indexes for your most common database reads? That's often the cause of really big performance problems, but thankfully the solution is quick and easy.
 
Derick Potgieter
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the response!

There is for sure come optimizations that we can do, but i cant see is getting past 10 records a second. It has more to do with the actual processing and amount of processing we are performing. The payload we are using is also quite substantial and unmarshaling this using JAXB plus changes, comparisons, additional lookups and persisting this schema into multiple tables is quite intensive.

I`m really looking here for a mass real-time processing pattern type implementation.

We are using this application in a bank and the size of the data plus volume is quite intense.

We will optimize the application but i still need a way to process this as fast as possible.
 
Ranch Hand
Posts: 1683
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the typical size of the objects you will pass into each MDB's onMessage() method?
 
Derick Potgieter
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
About 0.5kb-5kb, with small bursts over 5kb up to a max of 20kb.

Server specs is a quad core xeon, 16gb ram, san storage as primary volume.

Rgds
D
[ May 27, 2008: Message edited by: Derick Potgieter ]
 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I may not be completely understanding your requirements, but it seems you have a burst of activity (lots of DB records) and you need to process them in parallel, but you have only a single timer bean running.

What if the timer bean only scans the incoming records, and for each one posts a message which contains the record ID (primary key or whatever is necessary to identify a particular record). An MDB receives the messages and processes them one at a time by reading the incoming DB record and doing whatever (2 second) processing is needed.

Now, configure your server to support whatever degree of parallelism you want for the MDB. For example, in WebSphere you configure the "Maximum concurrent endpoints" to the number of (concurrent) instances of the MDB the server will create to handle messages from that particular JMS topic. If you set this value to 10, then the server will create up to 10 instances of the MDB, each running in parallel pulling messages off the same queue.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I agree with Mark there is no need for message selectors to scale MDB's it can be achieved by increasing the number of (concurrent) endpoints in Websphere I am sure there will be similar setting in other App servers.
Also increasing the batch size to more than one should improve the MDB performance there by allowing the MDB to pick messages in a batch from the Queue. Implementing message selector is a overhead not a performance improvement.
reply
    Bookmark Topic Watch Topic
  • New Topic