aspose file tools
The moose likes Java in General and the fly likes Reading and Writing from Large File, OutOfMemory Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Reading and Writing from Large File, OutOfMemory" Watch "Reading and Writing from Large File, OutOfMemory" New topic
Author

Reading and Writing from Large File, OutOfMemory

Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
Hi All,

I have a directory which contains large files ~1GB to ~5GB. Recursively i pick up the files and from each file I look for a particular string or sequence of char and create files. The string is the starting and ending point of the sub file which i want to create.

I read some amount of bytes at a time and work on those bytes. However, I am getting OutOfMemory exception after some time. I have set Xmx to 2048. I try to read less number of bytes but then program gets very slow. I have added System.gc() and System.runFinalization() in between the object creation.

I am using RandomFileAccess.

Please let me know how I can over come the problem of OutOfMemory and program to run little faster.

Thanks
Anant
Jesper de Jong
Java Cowboy
Bartender

Joined: Aug 16, 2005
Posts: 12929
    
    3

Can you show us the source code? You probably have a memory leak somehwere, but it's very hard (or actually, impossible) to debug and point out the exact problem without seeing the source code.

You could try using a profiler, for example JVisualVM (which is included with JDK 6) to find out what is eating up all the memory.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
Hi Jasper,

Thanks for the reply.

Here is the code.


Please let me know if anything required.
[edit[Break // comments and change to /* */ because the lines in code are too long for screen width. CR[/edit]
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

I find that code rather difficult to read, and the comments do more harm than good :(
Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
Hi David,

I have modified the comments.

I apologize for inconvenience.

If the code or comments still do not make sense, please let me know.. i will refactor it again.

Thanks
Anant
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

I agree with David. The code is hard to read. I would refactor/rewrite it to use a more object oriented approach and introduce a lot of new methods. Because a single method of 190 lines of code is hard to comprehend. The calls to System.gc() are unnecessary because the garbage collector is guaranteed to run before a OutOfMemoryError is thrown by the JVM.


"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
Hey Guys,

Sorry for being late. Here is the complete code. It has got more comments as well. I have refactored little. Please let me know if this is still not readable.


Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 16483
    
    2

I would use the profiler too. But I don't see why you need a 100 MB buffer to work in. In most situations a buffer size of somewhere around 8 KB is quite sufficient.

And I wouldn't do my own buffering like that. I'm not quite sure what rules you're using to filter the data but it seems to me that you're looking for the string "ABCD" as some kind of sentinel. I would rewrite the code to read the data sequentially using an ordinary FileInputStream. Set up a buffer which always contains the last four bytes read. Then you can compare that buffer to "ABCD" and act accordingly with respect to whether to write data.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

<offtopic>
Way, *way* too many comments. If single lines of code require a paragraph of comments, something has gone wrong.
</offtopic>
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32712
    
    4
Also the comments (don't use // on more than one line) and some of the code is in lines too long to read on screen.
Jesper de Jong
Java Cowboy
Bartender

Joined: Aug 16, 2005
Posts: 12929
    
    3

Some remarks:

Line 57, 59: You never need to call new String("literal"). Just write "literal" - it makes no sense to create a separate new String object.

If I see this right, when the program doesn't find the sequence ABCD anywhere in the file, it will try to append the whole file to your ByteArrayOutputStream. That could easily cause an OutOfMemoryError if the file is very large.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24061
    
  13

Also, 131072 * 100 is not 100MB, it's about 1.25 MB. Just an example of what was meant by "the comments hurt more than they help".


[Jess in Action][AskingGoodQuestions]
Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
This is what I have to achieve.

There are big files and each file is made with joining many other files. The starting of nested file is ABCD. So I need to go through each byte and see if ABCD comes in the sequence. Once i find it, i need to start writing the bytes to files until i find another ABCD.


Paul, I can do the way you have mentioned but that makes my program very slow. And i have files with the size of 5GB as well. Reading 8/4 bytes at time will be a very slow progress. I will try to use profiler and see what they have to say.

David, Campbell,

I have written comments to get better understanding of the algorithm which i have implemented in the program. I can still shorten it. I will do it and post the code again.

Jasper,

Line 57, 59 are just for testing I have kept. Later on these strings will be passed as arguments to the program from the console.
The program doesn't append the whole file to ByteArrayOutputStream when we do not find ABCD. It will only append the current byte which is read and if any bytes A,AB,ABC where found before the current byte (given that current byte is not D).

File will be written/appended only when we find ABCD and RandomFileAccess object is not null. All the bytes written in BinaryArrayOutputStream will be dumped into RandomAccessFile.

Thanks everybody, I really appreciate your valuable time spending on this post.

I will redo the code/comments and post it again.

Regards,
Anant
Anant Jagania
Ranch Hand

Joined: Oct 20, 2004
Posts: 49
Ernest , Also, 131072 * 100 is not 100MB, it's about 1.25 MB. Just an example of what was meant by "the comments hurt more than they help".


My apology.

Thanks for pointing this out.

I have corrected back the comments and number of bytes to read (MAXREAD) in the code.

Please have a look at it.

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32712
    
    4
Please don't post the code again. My point was not that the comments were too long, but that they were
allwrittenasasinglelinewhi . . . wthescrollbarsappear.

See next post for what it ought to have said
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32712
    
    4
I myself wrote: . . .
all written as a single line which nobody could read because it is too wide for the screen even here at work where I have a screen with 1440px from left to right so it is really difficult to read on an ordinary screen or with large writing in fact it is a good idea to keep all line lengths below 80 keystrokes to make it easier to read and I look forward to Maneesh Godbole posting UseRealWords but you will see how the scroll bars appear.
. . . or would have had I UsedRealWords.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Well, I think the comments are too long, and don't add to our understanding. They might add to *yours*, but then I'd recommend taking them out so they don't confuse *us*, the folks trying to understand your code.The method is named wrong. With the proper name, no comment would be necessary. It also has a side effect, which makes the naming somewhat more difficult, but if it was clear that all file pointer manipulation methods also updated the numberOfData (which is also not a great name), it wouldn't be necessary to include it.

Rather than rename the method, why not just pass in the actual pointer manipulation you want?Otherwise it'd have to be called moveFilePointerBack or something, and it just seems redundant.Here you've described much of what's already known about RandomFileAccess, and again, you are forced to make a comment that would be unnecessary if the method was named appropriately.Here the comment describes precisely what the code does, but takes 2.5x as much text as just the code--this code is self-explanatory. The comment adds unnecessary cognitive overhead.

These same meta-comments apply to much of the code: the code speaks louder than the comments. The code is the ultimate arbiter of what's happening. Comments are difficult to keep in sync with code, because they cannot be (trivially) mechanically checked for correctness. Naming, refactoring, and common sense do more to aid readability than a paragraph for each line of code.

(This is obviously just my opinion, and other folks may feel differently.)
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 10826
    
  25

How many times do you create a new RandomAccessFile and assign it to fos? And how many times do you close the old streams?

You would probably benefit greatly by moving most of your instance variables to method variables. It will then become more obvious as to when they are not being used correctly.

You would probably also benefit from renaming methods and variables (normally if I see a variable named fos, I expect it to be a FileOutputStream - not a RandomAccessFile (often named raf)).
Neither fos nor raf are good descriptions of what the file is being used for though.


The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Reading and Writing from Large File, OutOfMemory
 
Similar Threads
OutOfMemory error but application continues
OutOfMemory error
OutOfMemory Using JDK 1.3
outofmemory reading text file
OutOfMemory Error