• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Perfomance issue while Loading 19 GB file in memory

 
sag rusty
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a file containing 19gb of data in it(fixed length ~318 million records).
loading this file takes ~2 hrs using nio using byteBuffer of size 651 byte(size of 1 record).
How do I reduce the loading time ?
 
Wouter Oet
Saloon Keeper
Posts: 2700
IntelliJ IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi and welcome to the JavaRanch!

Question: why do want to load those 19 GB into memory? What do you want to do with it?
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean by "loading" time? I've never had a computer with enough ram to load a 19GB file into memory.
Do you mean "read it all into memory" or more like "read and process each record"

I sure would figure out an algorithm that let me process the file as I loaded it. Even if you had a big enough chunk of RAM, the GC time will kill you.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13064
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
On what basis did you come to focus on NIO and buffer size as the limiting factor?

Bill
 
sag rusty
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
Thanks for reply.
by Loading means, I will read each & every record from 19gb file & based on some filters, I will write those records in a file on disk.

@William: I took out only 64mb of data from this file & tried to "load" this data with (java.io), it took ~18 mins & with (java.nio) of buffersize 651(record size) it took ~1.4 mins,
so I decided to go with nio for whole 19 gb file.

Thanks.
 
Stephan van Hulst
Bartender
Pie
Posts: 5888
63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How about using a simple BufferedInputStream wrapped around a FileInputStream; read each record and deal with it before you read the next record?
 
Rishi Shehrawat
Ranch Hand
Posts: 218
Hibernate Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you looked at a streaming based approach, i:e read record, process it & write it vis-a-vis reading the complete file into memory & then processing it ?
This approach will improve memory footprint & will also improve performance. Streaming based approach is used for processing large datasets for high performance batch processing.

 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13064
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My approach would be to use 3 Threads as follows.

Thread 1 reads records to maintain a queue waiting to be processed - I would certainly try a buffered input stream and test buffer sizes MUCH larger than a single record
Thread 2 grabs a record from the queue, decides how to treat it and wites to the output queue
Thread 3 manages the output queue and writes records only when a reasonable number have accumulated

This approach recognizes that you want to minimize calls to the operating system IO.

I would certainly look into java.util.concurrent.SynchronousQueue for Thread 1

Bill
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic