I have to fix an application which uses a single transaction to update several thousand database records in a batch-like job. The updates are relatively simple, only one timestamp is written for each record. JPA is used. Unfortunately the entities are detached, modified and then merged again (funny architecture, nothing I can do about now). With several thousand records the application works sufficiently fast. However with more than 10.000 the transaction never completes. Unfortunately the job has to be completed in an all or nothing way hence the transaction.
We have profiled the application and we cannot see any resource bottlenecks, neither in the application server nor the database.
Are there any possibilities to influence this behavior via configuration or simple modifications of the code?
What would be the best way to achieve atomicity in large batch-like jobs?
when you say the transaction never completes what do you actually get ? an error message ? if yes what is it ?
With more information around the problem we can provide a more appropriate answer
SCJP 5 , SCWCD 5, SCEA 5
Joined: Feb 25, 2007
I have not found any exceptions in the logs. The processing gets extremely slow without any noticeable resource bottlenecks. After over 2 days we have killed the application server because the whole process is supposed to terminate within a few hours.
Joined: Oct 21, 2008
I see, I think the process was still going when you killed it just at a very slow pace.
I had a similar case before.
Well in JPA/hibernate you have examples of specific code to handle big batches.
Basically you keep in memory only 1000 objects that you insert in DB, then you clear the memory and take care of the next 1000 and so on ....
After you know JPA generate a lot of SQL requests if you have a rich object model with plenty of relationships.
So you may want to check the SQL generated by JPA and make all SQL requests are really necessary.
I could improve the perf of my batch by leaving alone not necessary foreign keys