Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Load 3 GB data to java memory

 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,
I have a lookup file(around 3 GB) as below and I would like to load that file to java memory. I would like to know the possbility of this. Could you please any one guide me?

1|100
2|200
3|300

Thanks
Bala
 
Stuart A. Burkett
Ranch Hand
Posts: 679
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Assuming you are running a JRE that can support that much data in memory, then it will depend on what you intend doing with the data. You will need to store it in a collection of some kind - array, List, Set, Map.
But before you do that you should revisit your design to make sure you really need to have all that data in memory at once - is it possible to just load part of the data at a time. If so it means you will be less restricted on what machines your program will run on.
Once you've done that you need to read up on the collections I mentioned to decide which is most suitable for your needs.
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you so much for your promt reply. Let me explain you in detail. So that you can guide me if possible

JRE: 1.4
RAM: 3 GB

I have around 20000(around 1 or 2 million records in each file) files with coulmn A and another file with columns A and B (this is like lookup file). So now i have to iterate through all those 20000 files and column A has to be replaced with column B(from the look file). This is the requirement.

I am looking for the option that does not requires to load data to database.

Thanks in advance
Bala
 
Stuart A. Burkett
Ranch Hand
Posts: 679
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So it sounds like the only thing you need to keep in memory is the lookup file. You then just read each of the other files in line by line. For each line you make the required changes and then write it out to a temporary file. Once you've processed every line in the file, you delete it and then rename your temporary file to the name of the original file.
 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes. I need to keep only that look up file which is around 3GB. is it possible?
 
William P O'Sullivan
Ranch Hand
Posts: 859
Chrome IBM DB2 Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JRE 1.4 ???

Major memory handling and performance issues were fixed in 1.5+. I would highly suggest an upgrade.

If you have 3Gb of RAM, how on earth do you expect to load a 3Gb file without swapping/paging the system to death.

Is this running on Windoze, 32 or 64bit?

As Adrian said, maybe you need to rethink your needs. What about a random access file? Only load what you actually need.

WP
 
Winston Gutkowski
Bartender
Pie
Posts: 10417
63
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Balasubramaniam Muthusamy wrote:Yes. I need to keep only that look up file which is around 3GB. is it possible?

A 3Gb lookup file? Even assuming it's text (which it probably shouldn't be), I would reckon a 20,000 line lookup file would fit into a few meg.

Methinks your problems start a lot further back than this.
Why on earth would anyone keep 20,000 files around to support a system? Especially ones of that size?
The only possible reason I can think of is that it's independently distributed and that this is some sort of 'batched' update involving temporary files, or some "database" made up of a bunch of redundant copies of "data"; in which case why not just bite the bullet and implement a proper one?

Winston
 
fred rosenberger
lowercase baba
Bartender
Posts: 12122
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think it also depends on whether this is a one-off run, or something you need to run daily/hourly/weekly....

If it's a one-off, you could read in part of the lookup file, process the 20,000 data files, then read the next chunk. You'd need logic to handle interruptions, but I think those would be solvable. so write it, and let it go for as long as it takes.

alternatively, you could process one of the 20k files in its entirety. you'd write the changes to a .tmp file. Then, when you complete the lookup file, you rename the .tmp to the original name. by looking at timestamps, you could figure out which had been done and which hadn't. If the job is killed, the .tmp file can be discarded, and you would restart on the untouched source file...

Sure, this will take a while, but what do you expect when you have 20,000,000,000 records to process against a 3GB file...


 
Balasubramaniam Muthusamy
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you so much for all your replies.

This is just one shot fix and not going to run anymore. My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?

Thanks much again
 
Winston Gutkowski
Bartender
Pie
Posts: 10417
63
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Balasubramaniam Muthusamy wrote:This is just one shot fix and not going to run anymore.

Hmmm. Hate to say, but I've heard that before.

My lookup file has around 125 million records. is there any way we can split into chunks fo data and process them? I am also looking for is there any kind of index option or RandomAccessFile?

Sure, there are plenty of splitter utilities out there; or you could simply write one yourself (possibly better if the "splitting" is dependent on the data you're working on) and run it before your main update. Perl or awk are also very good for that sort of stuff.

But like I said before, from the little you've given us to go on, I suspect your problems start long before this. It just sounds 'off'. And unless you fix that you're probably doomed to repeat this exercise.

Winston
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic