Hello, I am writing a java program which, in nut shell, reads from a file and write to a file. The input file is like this : Bush is the President of USA. 5 then the output file will have Bush is the President of USA. Bush is the President of USA. Bush is the President of USA. Bush is the President of USA. Bush is the President of USA. Basically it will repeat the first line the number of times defined in the second line of the input file.(like 5 in the above example) The program is working just fine.But I now I have noticed a problem.The problem is when the number of iteration is a very high number -- like let's say 1000,000,000 (=1 billion). It gives an error which says Exception in thread "main" java.lang.outofmemoryerror. How should I handle this outofmemory error. Your input will be quite appreciated. Thanks.
1- What does your code look like? Maybe you're creating lots of objects unnecessarily and not removing the reference so they're taking up all your memory. 2- You can adjust the heap size of your memory (but normally this will only delay the problem if you have a memory leak somewhere). To adjust your heap size, use the -Xmx and -Xms parameters when starting your JVM (call java -X for specific info on the -X options) ex: java -Xmx 256M com.jess.MyClass (this starts the JVM with a 256 Megabyte heap)
From the description of what you're trying to do, there is no reason you need to be using enough memory to require changing the JVM memory settings. You're doing something which is saving things in memory - adding to a StringBuffer perhaps? Putting strings in a Vector or other object? No need. Just write your message directly to whatever Writer / OutputStream / RandomAccessFile you're using for output, however many times you need to. The message should not be building up in RAM - it should be getting written to a file. If you still have problems, please show the code you're using to write the file. Good luck...
"I'm not back." - Bill Harding, Twister
Joined: Jan 08, 2001
I realized that problem is in some other method. Here is what this method is supposed to do : Input File(input.txt) is like this : 5575421,CNN,65431 5575422,FOX,65432 5575423,CNN,65433 5575424,CNN,65434 5575425,CNN,65435 5575426,CNN,65436 5575421,CNN,65437 5575422,FOX,65438 5575423,CNN,65439 5575424,USA,65440 The output will contain the following : 1. The total Number of records in the input file -- in the above example it will be 10. 2.The number of unique values in the first column -- in this case 6. 3. The number of unique values in the second column -- In this case it will be 3 (CNN,FOX and USA). This is what my code look like :
This is working just fine as I said in my previous post but when input file has large number of records (say 1 Billion or so) then it crashes saying 'OutofMemoryError'. So where is the memory leak ? what is the better option to handle such a large file ? Your input is appreciated. Thanks. [ edited to correct code formatting -ds ] [ August 07, 2002: Message edited by: Dirk Schreckmann ]
Joined: Jan 30, 2000
Well, if you keep putting values into a HashSet, it gets bigger. I wouldn't call that a "leak"; it's the expected behavior here. You just need to find a more efficient way to do it. First, do as Jessica suggested and increase the memory allocated to your JVM. This will be simplest, if you have the memory to spare. Try separating your loop into two loops - one for hs1; another for hs2. Sure, you'll spend twice as much time reading lines, but this way you won't have to keep both in memory at the same time. (Set hs1 = null once you're done with it, before the second loop.) This will only have a significant effect if the two HashSets are of roughly comparable size. If hs2 only has a few hundred entries at most, it's not worthwhile putting it in a separate loop. Next, try replacing
(Likewise for hs2.) This may seem strange, but the String created by substring() uses the same character array as the original String you got from readLine(). This means that even though you're only interested in, say, the first 7 characters of it, you're actually keeping the whole line in memory when you save a reference to the substring. By using the new String() constructor, you create a new String that uses only enough memory for the substring - not for the extra chars of the full original line. It looks like the first field is always a number. If so, you can probably get additional space savings by storing it as an Integer rather than a String. (An int would be even more efficient, but collections don't work with primitives. ) Going a step further, if you're going to have a lot of different integer values, it may well be more efficient to use a BitSet rather than a HashSet. If the first column is always an integer less than 10,000,000, you can allocate a BitSet with 10,000,000 bits (~1.2 MB) which will keep track of each number individually. This may seem like a lot, but the advantage is that it will never have to grow in size (unless you try to store a number greater than 10 million). If you have 100 unique values to save, the HashSet will be smaller - but if you have 1,000,000, the BitSet will certainly be smaller. You can optimize further if you can restrict the range of possible values. E.g. if the number will always be between 5 and 6 million, you can subtract 5 million, and store numbers 0 to one million - taking a tenth as much memory as 0 to ten million did. I'm guessing these techniques (particularly the BitSet) will take care of the problem for you. If not, you'll probably need to look into alternate techniques which do not require you to represent all values read in memory, but instead rely on writing some things to intermediate files instead. Proably the easiest way to do this is to use a database, which will have already coded this functionality for you. Good luck... [ August 07, 2002: Message edited by: Jim Yingst ]
Joined: Jan 08, 2001
Could not understand the messages : When I used -Xmx option like : java -Xmx 256M Test I got the following message : Incompatible initial and maximum heap sizes specified: initial size: 1048576 bytes, maximum size: 0 bytes The initial heap size must be less than or equal to the maximum heap size. The default initial and maximum heap sizes are 1048576 and 67108864 bytes. Could not create the Java virtual machine. When I used -Xms option like java -Xms 256M Test I got the following message : The specified initial heap size is too small. (262144 bytes required.) Could not create the Java virtual machine.
I don't know how to increase the size of VM . Any link to get more info on these ? Am I doing something wrong here ?