I have a file containing 19gb of data in it(fixed length ~318 million records).
loading this file takes ~2 hrs using nio using byteBuffer of size 651 byte(size of 1 record).
How do I reduce the loading time ?
On what basis did you come to focus on NIO and buffer size as the limiting factor?
Joined: Aug 14, 2011
Thanks for reply.
by Loading means, I will read each & every record from 19gb file & based on some filters, I will write those records in a file on disk.
@William: I took out only 64mb of data from this file & tried to "load" this data with (java.io), it took ~18 mins & with (java.nio) of buffersize 651(record size) it took ~1.4 mins,
so I decided to go with nio for whole 19 gb file.
Have you looked at a streaming based approach, i:e read record, process it & write it vis-a-vis reading the complete file into memory & then processing it ?
This approach will improve memory footprint & will also improve performance. Streaming based approach is used for processing large datasets for high performance batch processing.
Author and all-around good cowpoke
Joined: Mar 22, 2000
My approach would be to use 3 Threads as follows.
Thread 1 reads records to maintain a queue waiting to be processed - I would certainly try a buffered input stream and test buffer sizes MUCH larger than a single record
Thread 2 grabs a record from the queue, decides how to treat it and wites to the output queue
Thread 3 manages the output queue and writes records only when a reasonable number have accumulated
This approach recognizes that you want to minimize calls to the operating system IO.