| Author |
Perfomance issue while Loading 19 GB file in memory
|
sag rusty
Greenhorn
Joined: Aug 14, 2011
Posts: 2
|
|
I have a file containing 19gb of data in it(fixed length ~318 million records).
loading this file takes ~2 hrs using nio using byteBuffer of size 651 byte(size of 1 record).
How do I reduce the loading time ?
|
 |
Wouter Oet
Saloon Keeper
Joined: Oct 25, 2008
Posts: 2700
|
|
Hi and welcome to the JavaRanch!
Question: why do want to load those 19 GB into memory? What do you want to do with it?
|
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
|
 |
Pat Farrell
Rancher
Joined: Aug 11, 2007
Posts: 3688
|
|
What do you mean by "loading" time? I've never had a computer with enough ram to load a 19GB file into memory.
Do you mean "read it all into memory" or more like "read and process each record"
I sure would figure out an algorithm that let me process the file as I loaded it. Even if you had a big enough chunk of RAM, the GC time will kill you.
|
 |
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
|
|
On what basis did you come to focus on NIO and buffer size as the limiting factor?
Bill
|
Java Resources at www.wbrogden.com
|
 |
sag rusty
Greenhorn
Joined: Aug 14, 2011
Posts: 2
|
|
Hi all,
Thanks for reply.
by Loading means, I will read each & every record from 19gb file & based on some filters, I will write those records in a file on disk.
@William: I took out only 64mb of data from this file & tried to "load" this data with (java.io), it took ~18 mins & with (java.nio) of buffersize 651(record size) it took ~1.4 mins,
so I decided to go with nio for whole 19 gb file.
Thanks.
|
 |
Stephan van Hulst
Bartender
Joined: Sep 20, 2010
Posts: 2771
|
|
|
How about using a simple BufferedInputStream wrapped around a FileInputStream; read each record and deal with it before you read the next record?
|
 |
Rishi Shehrawat
Ranch Hand
Joined: Aug 11, 2010
Posts: 218
|
|
Have you looked at a streaming based approach, i:e read record, process it & write it vis-a-vis reading the complete file into memory & then processing it ?
This approach will improve memory footprint & will also improve performance. Streaming based approach is used for processing large datasets for high performance batch processing.
|
 |
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
|
|
My approach would be to use 3 Threads as follows.
Thread 1 reads records to maintain a queue waiting to be processed - I would certainly try a buffered input stream and test buffer sizes MUCH larger than a single record
Thread 2 grabs a record from the queue, decides how to treat it and wites to the output queue
Thread 3 manages the output queue and writes records only when a reasonable number have accumulated
This approach recognizes that you want to minimize calls to the operating system IO.
I would certainly look into java.util.concurrent.SynchronousQueue for Thread 1
Bill
|
 |
 |
|
|
subject: Perfomance issue while Loading 19 GB file in memory
|
|
|