This week's giveaway is in the EJB and other Java EE Technologies forum.
We're giving away four copies of EJB 3 in Action and have Debu Panda, Reza Rahman, Ryan Cuprak, and Michael Remijan on-line!
See this thread for details.
The moose likes Beginning Java and the fly likes Load 3 GB data to java memory Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Load 3 GB data to java memory" Watch "Load 3 GB data to java memory" New topic
Author

Load 3 GB data to java memory

Balasubramaniam Muthusamy
Ranch Hand

Joined: Nov 30, 2010
Posts: 51
Hello All,
I have a lookup file(around 3 GB) as below and I would like to load that file to java memory. I would like to know the possbility of this. Could you please any one guide me?

1|100
2|200
3|300

Thanks
Bala
Stuart A. Burkett
Ranch Hand

Joined: May 30, 2012
Posts: 679
Assuming you are running a JRE that can support that much data in memory, then it will depend on what you intend doing with the data. You will need to store it in a collection of some kind - array, List, Set, Map.
But before you do that you should revisit your design to make sure you really need to have all that data in memory at once - is it possible to just load part of the data at a time. If so it means you will be less restricted on what machines your program will run on.
Once you've done that you need to read up on the collections I mentioned to decide which is most suitable for your needs.
Balasubramaniam Muthusamy
Ranch Hand

Joined: Nov 30, 2010
Posts: 51
Thank you so much for your promt reply. Let me explain you in detail. So that you can guide me if possible

JRE: 1.4
RAM: 3 GB

I have around 20000(around 1 or 2 million records in each file) files with coulmn A and another file with columns A and B (this is like lookup file). So now i have to iterate through all those 20000 files and column A has to be replaced with column B(from the look file). This is the requirement.

I am looking for the option that does not requires to load data to database.

Thanks in advance
Bala
Stuart A. Burkett
Ranch Hand

Joined: May 30, 2012
Posts: 679
So it sounds like the only thing you need to keep in memory is the lookup file. You then just read each of the other files in line by line. For each line you make the required changes and then write it out to a temporary file. Once you've processed every line in the file, you delete it and then rename your temporary file to the name of the original file.
Balasubramaniam Muthusamy
Ranch Hand

Joined: Nov 30, 2010
Posts: 51
Yes. I need to keep only that look up file which is around 3GB. is it possible?
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

JRE 1.4 ???

Major memory handling and performance issues were fixed in 1.5+. I would highly suggest an upgrade.

If you have 3Gb of RAM, how on earth do you expect to load a 3Gb file without swapping/paging the system to death.

Is this running on Windoze, 32 or 64bit?

As Adrian said, maybe you need to rethink your needs. What about a random access file? Only load what you actually need.

WP
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7042
    
  16

Balasubramaniam Muthusamy wrote:Yes. I need to keep only that look up file which is around 3GB. is it possible?

A 3Gb lookup file? Even assuming it's text (which it probably shouldn't be), I would reckon a 20,000 line lookup file would fit into a few meg.

Methinks your problems start a lot further back than this.
Why on earth would anyone keep 20,000 files around to support a system? Especially ones of that size?
The only possible reason I can think of is that it's independently distributed and that this is some sort of 'batched' update involving temporary files, or some "database" made up of a bunch of redundant copies of "data"; in which case why not just bite the bullet and implement a proper one?

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10911
    
  12

I think it also depends on whether this is a one-off run, or something you need to run daily/hourly/weekly....

If it's a one-off, you could read in part of the lookup file, process the 20,000 data files, then read the next chunk. You'd need logic to handle interruptions, but I think those would be solvable. so write it, and let it go for as long as it takes.

alternatively, you could process one of the 20k files in its entirety. you'd write the changes to a .tmp file. Then, when you complete the lookup file, you rename the .tmp to the original name. by looking at timestamps, you could figure out which had been done and which hadn't. If the job is killed, the .tmp file can be discarded, and you would restart on the untouched source file...

Sure, this will take a while, but what do you expect when you have 20,000,000,000 records to process against a 3GB file...



There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Balasubramaniam Muthusamy
Ranch Hand

Joined: Nov 30, 2010
Posts: 51
Thank you so much for all your replies.

This is just one shot fix and not going to run anymore. My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?

Thanks much again
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7042
    
  16

Balasubramaniam Muthusamy wrote:This is just one shot fix and not going to run anymore.

Hmmm. Hate to say, but I've heard that before.

My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?

Sure, there are plenty of splitter utilities out there; or you could simply write one yourself (possibly better if the "splitting" is dependent on the data you're working on) and run it before your main update. Perl or awk are also very good for that sort of stuff.

But like I said before, from the little you've given us to go on, I suspect your problems start long before this. It just sounds 'off'. And unless you fix that you're probably doomed to repeat this exercise.

Winston
 
Consider Paul's rocket mass heater.
 
subject: Load 3 GB data to java memory
 
Similar Threads
Increase JVM size for thread creation
JVM, OutOfMemoryError, MaxPermSpace
Max Heap size on Websphere 5
Out of Memory with high Free Memory
Using record cache and booking method