This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Performance and the fly likes Out Of Memory error  - Report generation of 3 lakhs records Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Out Of Memory error  - Report generation of 3 lakhs records" Watch "Out Of Memory error  - Report generation of 3 lakhs records" New topic
Author

Out Of Memory error - Report generation of 3 lakhs records

Birla Murugesan
Ranch Hand

Joined: Nov 25, 2008
Posts: 66

Hi,

on Report generation of high volume records(3 lakhs), Application server thrown a out of memory error.so that server crashes & went down.

Actually, we are using the stateless session bean to retrieve this 3 lakhs records from db & iterating the resulset & storing the each record values in vector. Finally returing these all vector objects(say 3 lakhs vector) to client i.e servlet. once again client servlet iterating this vector object & generating the report.

On analyzing the heap dump files, 60% of memory leakage caused due to this large no of created vector objects.

So decided to process the whole results into smaller, manageable chunks.

Please recommend any better design patterns or alternatives to implement this solution.



William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12760
    
    5
Doing major computation with large numbers of records inside the servlet request/response cycle is a bad idea.

If this was my problem I would have the servlet create a separate process handling small numbers of records at one time and writing the results to a file where it could be downloaded when the job is done.

Bill


Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2272
    
  28

We do a lot of handling of big data, and generally we use 2 strategies

a) Do everything on the fly. Forget batch processing. Forget that the database exists. All the data related to reporting is generated as you are processing the data. Generally, when people design workflow of an application that deals with data, they list out the steps during the processing "First I load the data. Then I transform the data. Then I extract the data into a report". The problem is traditionally, this is how software developers have modeled their software. "I'll build a data load tool that loads the data into database. Then I build a tool that transforms the data in the the database. Then I build this tool that extracts the datainto a report". The problem is each of these steps requires the data moving from one place to another at each step

A lot of "Big data " apps are built by making each data record go through all the steps as soon as it's received. So, for every record that you ingest, you take the record through all the steps that you need to take them through, and the possible end results should be generated out of one process. This minimizes the amount of data flowing back and forth, and also reduces memory foot print. You keep only 1 record in memory. Also, it's easily parallelizable

Of course, most applications are not so simple. Many times, you have processes that are dependent on results of other processes, and you can;t run them unless you have results from multiple records available. Yes, true.. but you can still do many things on the fly rather than batching everything. Batching everything is lazy, and leads to problems like "oohh I have umpteen zumpteen records to process in the database. what should I do?" Screw the database, man!! DO all your processing before data gets to database.



b) If you have to batch, and you don't have any other option besides taking the data out use the tools provides by the database. Databases are designed to store data. Database utilities that come packaged with the database are designed to take data in and out of database fast. Sure, you can build something in Java that can match the speed and flexibility of Oracle's SQLLoader, but you would be reinventing the wheel. The downside is that you increase your lockin with the DB vendor
Birla Murugesan
Ranch Hand

Joined: Nov 25, 2008
Posts: 66
William Brogden wrote:Doing major computation with large numbers of records inside the servlet request/response cycle is a bad idea.

If this was my problem I would have the servlet create a separate process handling small numbers of records at one time and writing the results to a file where it could be downloaded when the job is done.

Bill




William,
Thanks for your response.
But technically, how this can be implemented?
Please provide your suggesions.
parthasarathy madhira
Ranch Hand

Joined: Aug 31, 2001
Posts: 41
What type of data is being retrieved here? Who is interested in such a long report ? Generally speaking, it is a rare requirement where users have to go through 300,000 records.
Is it a report or an online search screen?
If it is a report, why are you using servlet , session beans for reports? Why can't you use a reporting engine OR much better a DWH ?
If it is an online search screen, you need to add enough search filters to fetch small chunks of data. And if your filters still fetch so many records(which still is hard to make sense) ,
you should possibly do pagination (just Google for java pagination) and also limit the max. no. of search results to say 250 or 500 depending on the available amount of memory.


William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12760
    
    5
But technically, how this can be implemented?


Java is loaded with possible approaches, for example:

You could use Runtime.exec to start a separate process on the same computer - quick, local, but hard to monitor.

You could JavaSpaces technology for example using Gigaspaces to distribute jobs all over your network. - very powerful not simple

You could farm the whole thing out to a "cloud" database and processes.

You could use Java Message Service as an alternate way to farm jobs out over your network

You could use Hadoop to distribute the jobs and collect the results.

.....

Bill
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Out Of Memory error - Report generation of 3 lakhs records
 
Similar Threads
Storing data in memory from Servlet
Most efficient way to dump a large HashMap?
Help needed in creating jasper report
Help to resolve jsper reports memory issue
browser time out problem.