File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes Perfomance issue while Loading 19 GB file in memory Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Perfomance issue while Loading 19 GB file in memory" Watch "Perfomance issue while Loading 19 GB file in memory" New topic
Author

Perfomance issue while Loading 19 GB file in memory

sag rusty
Greenhorn

Joined: Aug 14, 2011
Posts: 2
I have a file containing 19gb of data in it(fixed length ~318 million records).
loading this file takes ~2 hrs using nio using byteBuffer of size 651 byte(size of 1 record).
How do I reduce the loading time ?
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

Hi and welcome to the JavaRanch!

Question: why do want to load those 19 GB into memory? What do you want to do with it?


"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

What do you mean by "loading" time? I've never had a computer with enough ram to load a 19GB file into memory.
Do you mean "read it all into memory" or more like "read and process each record"

I sure would figure out an algorithm that let me process the file as I loaded it. Even if you had a big enough chunk of RAM, the GC time will kill you.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12759
    
    5
On what basis did you come to focus on NIO and buffer size as the limiting factor?

Bill
sag rusty
Greenhorn

Joined: Aug 14, 2011
Posts: 2
Hi all,
Thanks for reply.
by Loading means, I will read each & every record from 19gb file & based on some filters, I will write those records in a file on disk.

@William: I took out only 64mb of data from this file & tried to "load" this data with (java.io), it took ~18 mins & with (java.nio) of buffersize 651(record size) it took ~1.4 mins,
so I decided to go with nio for whole 19 gb file.

Thanks.
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3573
    
  14

How about using a simple BufferedInputStream wrapped around a FileInputStream; read each record and deal with it before you read the next record?
Rishi Shehrawat
Ranch Hand

Joined: Aug 11, 2010
Posts: 218

Have you looked at a streaming based approach, i:e read record, process it & write it vis-a-vis reading the complete file into memory & then processing it ?
This approach will improve memory footprint & will also improve performance. Streaming based approach is used for processing large datasets for high performance batch processing.

William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12759
    
    5
My approach would be to use 3 Threads as follows.

Thread 1 reads records to maintain a queue waiting to be processed - I would certainly try a buffered input stream and test buffer sizes MUCH larger than a single record
Thread 2 grabs a record from the queue, decides how to treat it and wites to the output queue
Thread 3 manages the output queue and writes records only when a reasonable number have accumulated

This approach recognizes that you want to minimize calls to the operating system IO.

I would certainly look into java.util.concurrent.SynchronousQueue for Thread 1

Bill
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Perfomance issue while Loading 19 GB file in memory
 
Similar Threads
Maximum size of a Java Class file
JBOSS DROOLS - java heap space
Lookahead with Java
Loading Font through .ttf file but issue with the size
Dynamic Stub Loading with file URL's