I am currently encountering an issue with java heap memory space. I reckon the reason for the same is the enormous size of data processing we are trying to achieve in the java batch utility program of ours.
The java batch program works something grabs a list of data objects from the underlying datastore using the Hibernate OR mapping framework and captures in a List object.
Once it fetches this collection of data objects it starts processing each of them. The business logic is actually working on each of these retrieved data object.
The problem is with number of objects retrieved which is gigantic over 200k. No matter how much ever memory size I try to increase it will always fall short.
the order of processing data objects is not important and we can even process them in parallel as there is no interdependency between two data objects.
I would like to know your thoughts on how the best possible strategy to handle the above mentioned scenario.
Is it possible to change the way you get your data from the data store? Instead of grabbing all 200K objects at once and then processing them, can you change it to grab 200, or maybe 2,000, process them, and then grab another batch?
Joined: Oct 23, 2003
Are you saying like grab the first 200 data objects, finish the processing and then the grab the next 200 and so on.
There is nothing like this in the current implementation but this would mean incorporating logic to monitor the status of every sub batch (200 data objects) and then fire a fresh request to fetch the subsequent data objects.
Are there any best practices in partitioning/parallel processing which can be leveraged here.
John de Michele
Joined: Mar 09, 2009
Yes, that sounds right. You could probably partition the processing to different threads (~1 per CPU) versus processing serially, which would probably give you a performance boost.