In my application, i have situation in which i need to read around 300,000 records and on each record need to perform some business logic and again update the database. I am thinking of the following solution, i have a class which reads the records from the database and groups them into batch of 200 and posts them to MDB. Each mdb would call the service for doing Business Logic and update the DataBase. Is this a good approach or there is another way by which i can improve the preformace. Also, I want to know when the all MDB's are done processing. How can i do it?
Why do you think using JMS will improve the performance here? Is the process you explained is some kind of a cron job? If yes, then you probably might be well off using a threadpool and processing each task in a different thread. You really do not need a MDB here.
The only reason to use JMS or JavaSpaces or any "Grid Computing" configuration would be if you are able to put multiple separate machines to work in a local network. If you are on one machine, Nitesh's suggestion for a threadpool of worker threads sounds good.
What have you measured so far? I would be looking for things like: 1. CPU time to read a batch of records 2. CPU time to do the business logic on a single record 3. CPU time to write a batch of record 4. amount of network bandwidth consumed
I would agree with Nitesh and William on this 100%. MDB could be an overkill in this situation if you are going ahead just for processing 300,000 records.
It would be worth to check the number of indexes that would be affected when you are doing the biz logic and update of database. That would directly impact the amount of time spent for this 300,000 records. Also it would hit the users who are currently active in your system (you would be updating the DB which in turn could be updating certain indexes. Parallely some active users in the system could be accessing those tables + indexes.. )
1. Read them in bulk 2. Hand off these to threadpoolexecutor (some thread pool to do biz logic) 3. Update the DB (Check if batch update can work out for you in this case to reduce "commit" everytime
Joined: Aug 10, 2005
Thanks for all your suggestions. We were thinking of MDB's because the business logic is present in ILOG which is invoked using Session EJB. If i am going to take each and every record process them sequentially then it would take a lot of time. If i am using an MDB, then each record would be processed concurrently, and it would improve the performance. One of the SLA is, we need to complete the entire process with in an hour. We are just in the inital stage of the project and decided on the best possible way to go about it, so we dont have measured any performance parameters. Since we have strict time constraints, we are brainstroming on various approaches, so that we can mitigate any risk araising due to performance issues in later stages.
Joined: Jan 14, 2003
I understand your concern. But with this, you are pushing the multi-threading capabilities to the container and want to just concentrate on invoking the biz logic and update the database. I see the following concerns. Obviously you are the best person to judge it for your project.
1. Since the MDB and the session beans would reside on the same container, the number of threads that would be required during this update would be high. 2. With standalone program, you have the option to run it on different machine with dedicated CPU. . Basically threads would get more CPU time with standalone approach than MDB+SessionBean residing in the same container. 3. What happens when the system is heavily loaded and this process kicks in. Is there a way to control it?. You may want to consider invoking the MDB only during a specific timeframe when the load on the server is minimal.
If you have single thread which fetches record from DB, invokes biz logic and updates back DB, then the response time has to be 0.012 seconds. (3600 seconds/300,000)
Obviously with multithreaded approach you can improve it. However, you may want to quickly check if the response time is met with single thread behavior now. If its less than that, you know its not a problem as of now.