This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes Performance and the fly likes Improve Performance by Best design Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Improve Performance by Best design " Watch "Improve Performance by Best design " New topic
Author

Improve Performance by Best design

Bhagat Singh Rawat
Ranch Hand

Joined: Apr 04, 2009
Posts: 93
Hi Rancher,

I have a requirement where I have to select around 60 million plus records from database. Once I have all records in ResultSet then I have to formate some columns as per the client requirement(date format and number format) and then I have to write all records in a file(secondary memory).

Currently I am selecting records on day basis (7 selects for 7 days) from DB and putting them in a HashMap. Reading from HashMap and formating some columns and finally writing in a file(separate file for 7 days).
Finally I am merging all 7 files in a single file.

But this whole process is taking 6 hrs to complete. To improve this process I have created 7 threads for 7 days and all threads are writing separate files.

Finally I am merging all 7 files in a single file. This process is taking 2 hours. But my program is going to OutOfMemory after 1 hour and so.

Please suggest me the best design for this scenario, should I used some caching mechanism, if yes, then which one and how?

Note: Client doesn't want to do anything from Database side, like create indexes or stored procedures.

Thanks in advance.

[NK: Removed the blue font colour. It makes it difficult to read the post]


Brainbench Java 2.0 Fundamentals, J2EE 1.4
http://www.brainbench.com/transcript.jsp?pid=8192792
Andy Hahn
Ranch Hand

Joined: Aug 31, 2004
Posts: 225
Why write them to a file in memory? Just stream them to a File on the file system. If you plan on pulling all those millions of records into memory first, then you better have some serious hardware and JVM configurations.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

This is a big export so I have to ask, why are you using JDBC at all? Databases come with better bulk import/export tools that you are liable to write yourself why not just use them?


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 16019
    
  20

Paul Sturrock wrote:This is a big export so I have to ask, why are you using JDBC at all? Databases come with better bulk import/export tools that you are liable to write yourself why not just use them?


Unfortunately, these tools aren't always made freely available to us peons. I could only use SQL*Loader by logging into a remote machine for example. But certainly it's an option.

Another possibility is to use an ETL tool such as the Kettle app from Pentaho. It should be capable of doing the required extraction and transformations and you can use a GUI designer (Spoon) to construct the extraction and transformation rules. The Pentaho suite is written in Java, and can be extended via custom Java code, but there's nothing exotic enough for this particular project to need that. And it already has various tricks built in to enhance the performance of the process.

Definitely streaming is the way to go. I don't recommend dumping this much data into RAM unless you intend to do some heavy cross-referencing. Which is probably better done on the database backend in any case.


Customer surveys are for companies who didn't pay proper attention to begin with.
Swapnil Shroff
Ranch Hand

Joined: Mar 07, 2006
Posts: 58
If core java is only option available to you then I would prefer so bring down the batch size down and free the resource as soon as processing is done.
For example, free result set once you build hash table. free hash table objects as soons as you write it to file.

Also why do you need hash table.. you can just read from resultset process the date columns and write to file..that way you would save some memory.


SCJP 5, SCDJWS<br /> <br />It's amazing how premature optimisation is both seductive and destructive; even when you know
 
wood burning stoves
 
subject: Improve Performance by Best design