• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

Improve Performance by Best design

 
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Rancher,

I have a requirement where I have to select around 60 million plus records from database. Once I have all records in ResultSet then I have to formate some columns as per the client requirement(date format and number format) and then I have to write all records in a file(secondary memory).

Currently I am selecting records on day basis (7 selects for 7 days) from DB and putting them in a HashMap. Reading from HashMap and formating some columns and finally writing in a file(separate file for 7 days).
Finally I am merging all 7 files in a single file.

But this whole process is taking 6 hrs to complete. To improve this process I have created 7 threads for 7 days and all threads are writing separate files.

Finally I am merging all 7 files in a single file. This process is taking 2 hours. But my program is going to OutOfMemory after 1 hour and so.

Please suggest me the best design for this scenario, should I used some caching mechanism, if yes, then which one and how?

Note: Client doesn't want to do anything from Database side, like create indexes or stored procedures.

Thanks in advance.

[NK: Removed the blue font colour. It makes it difficult to read the post]
 
Ranch Hand
Posts: 225
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why write them to a file in memory? Just stream them to a File on the file system. If you plan on pulling all those millions of records into memory first, then you better have some serious hardware and JVM configurations.
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is a big export so I have to ask, why are you using JDBC at all? Databases come with better bulk import/export tools that you are liable to write yourself why not just use them?
 
Saloon Keeper
Posts: 26300
187
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Sturrock wrote:This is a big export so I have to ask, why are you using JDBC at all? Databases come with better bulk import/export tools that you are liable to write yourself why not just use them?



Unfortunately, these tools aren't always made freely available to us peons. I could only use SQL*Loader by logging into a remote machine for example. But certainly it's an option.

Another possibility is to use an ETL tool such as the Kettle app from Pentaho. It should be capable of doing the required extraction and transformations and you can use a GUI designer (Spoon) to construct the extraction and transformation rules. The Pentaho suite is written in Java, and can be extended via custom Java code, but there's nothing exotic enough for this particular project to need that. And it already has various tricks built in to enhance the performance of the process.

Definitely streaming is the way to go. I don't recommend dumping this much data into RAM unless you intend to do some heavy cross-referencing. Which is probably better done on the database backend in any case.
 
Ranch Hand
Posts: 58
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If core java is only option available to you then I would prefer so bring down the batch size down and free the resource as soon as processing is done.
For example, free result set once you build hash table. free hash table objects as soons as you write it to file.

Also why do you need hash table.. you can just read from resultset process the date columns and write to file..that way you would save some memory.
 
There is no beard big enough to make me comfortable enough with my masculinity to wear pink. Tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic