aspose file tools*
The moose likes Performance and the fly likes Method that eats too much heap Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Method that eats too much heap" Watch "Method that eats too much heap" New topic
Author

Method that eats too much heap

olze oli
Ranch Hand

Joined: Jun 20, 2009
Posts: 148
I wrote a little method that recursivly lists content from a given directory.
The method works fine, unfortunatly its eating my memory and when i use this method in programs which has a lot of data (eg. they call this method 3000 times) i get an OutOfMemory Exception and i dont know why.
Here is the source:



Can someone tell me please whats wrong with that code? Thanks.
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336

The most likely reason is you are loading more data into the heap than you have space for. If you need to load all files from a directory and its subdirectories you will need to do some testing to work out how much heap is appropriate and increase it accordingly.


JavaRanch FAQ HowToAskQuestionsOnJavaRanch
olze oli
Ranch Hand

Joined: Jun 20, 2009
Posts: 148
The method quits normaly when a directory is done, in my application i get a new dir and call this method again, just with another parameter, and so on. So its not one directory where the heap becomes a problem its a problem when i run this method in a loop. But i dont understand why. Should i do something special with the ArrayList files? Eg. call files.clear() after finished working with them?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3010
    
  10
olze oli wrote:The method quits normaly when a directory is done, in my application i get a new dir and call this method again, just with another parameter, and so on. So its not one directory where the heap becomes a problem its a problem when i run this method in a loop.

What loop, specifically? Are you running the code above, or are you running something else? Because the code show above does have a loop, and a recursive call to getFilesFromDir(). That, fundamentally, is why you might well have a problem with memory usage here.

Consider a directory dirA, with a child directory dirB, with a child directory dirC. If you call getFilesFromDir(dirA), that method can't complete until it's completed the call to getFilesFromDir(dirB). And the call to getFilesFromDir(dirB) can't complete until it's completed the call to getFilesFromDir(dirC).

More generally, any call to getFilesFromDir() cannot complete until it has completed all calls to all its descendant directories. Yes, all of them. So, if the directory you're starting from has many children, whose children have many children, whose children's children have many children, whose children's children's children have many children, etc -- then the original call to getFilesFromDir() may use a LOT of memory before it completes. This should not be a surprise - it's built into the code you wrote.

Having said that - have you tried increasing the heap memory allocation? It's the most basic way to address this issue: give the JVM more memory. You need to look at the -Xmxn option.

There may be other ways to reduce the memory usage. I would suggest that storing the getAbsolutePath() of everything is wasteful, if memory is running out. All the paths you might see here share the same base path to the working directory - you don't need to store that again and again, every time a new path is stored. Using relative paths (e.g. getPath()) should do well enough, and it's shorter.

And why do you need to build a list of all these paths and put it into memory? What will it be used for? It may well be better to process each directory and each file as you go. Do whatever you need to do with each file - everything you need to do with that particular file - and then forget about it. Don't add the file, or file.getAbsolutePath(), or file.getPath(), to any List, if you can possibly help it. If you can avoid this, then your memory problems will probably vanish entirely.

If you really can't avoid it, and the other suggestions above don't help - if you've saved all the memory you could and it's still not enough, because there is fundamentally more data to process than you can possibly hold in memory - then you probably need to look at saving the data in some sort of repository, like a file or database. For what you describe, it would be simple to just write every new entry to a new line in a text file, and then read those lines later whenever you need to. Sure, it may be slower than keeping everything in memory. But any time you don't have enough memory, files and databases are a great alternative, not to be forgotten.
olze oli
Ranch Hand

Joined: Jun 20, 2009
Posts: 148
What loop, specifically? Are you running the code above, or are you running something else? Because the code show above does have a loop, and a recursive call to getFilesFromDir().

The loop outside this recursive function. I will post it later.
That, fundamentally, is why you might well have a problem with memory usage here.

No thats not the problem, i can process a directory easily, the heap size rises up to about 10MB and the method finished. The problem is that these informations arent garbage collected, the heap is growing constantly each time i call this method. When this method ends, i expect that everything gets garbage collected (the ArrayList) which does not at the moment. Thats the problem.

Consider a directory dirA, with a child directory dirB,...

Thats why i wrote a recursive method.

have you tried increasing the heap memory allocation?

No and i dont want to do that. The heap should be allocated by the garbage collector and i think 256MB must be enough for that simple application.

Using relative paths (e.g. getPath()) should do well enough, and it's shorter.

I will try that.

It may well be better to process each directory and each file as you go.

I have a bunch of directorys with programs in it and i have to do some checks with each program. These checks depend on the filelist (eg. check for existence of file abc.exe, xyz.bat etc.) - when these checks are done, the informations can be gargabe collected.

Its something like this:


i will check what the workstation is doing in a few minutes... after breakfast.


Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3010
    
  10
olze oli wrote:
Mike Simmons wrote:What loop, specifically? Are you running the code above, or are you running something else? Because the code show above does have a loop, and a recursive call to getFilesFromDir().

The loop outside this recursive function. I will post it later.

Great. Now that you've posted the code that you understand so well, and we've wasted time discussing it, you might post the code that is actually causing problems for you. That would be nice.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3010
    
  10
olze oli wrote:
Mike Simmons wrote:have you tried increasing the heap memory allocation?

No and i dont want to do that. The heap should be allocated by the garbage collector and i think 256MB must be enough for that simple application.

Um, OK. But why stop there? I remember hearing that 64kb should be enough for anyone. Or was it 640kb? I forget.

Do you have any analysis to back these statements, besides "i dont want" and "i think"? Because I think you haven't given any information about how many files are really underneath (considering children, grandchildren, and all other descendants) the directory you're starting from. I think it's extremely plausible that you fundamentally are looking at a bigger job than you think you are, and you should really consider simply allocating more memory. Rather than just giving us a petulant "I don't want to". If you have some unstated reasons to back up your claims, please state them. Allocating more memory is simple and easy to try. Is there some reason you can't even make the attempt?
olze oli
Ranch Hand

Joined: Jun 20, 2009
Posts: 148
I use JVisualVM to display the used heap and its always about 5-6MB in my applications. Iirc the heap is seperated into 3 sections, maybe one of them is that size.
The heap size rises if its neccessary up to the limit, and my limit is at the moment 256MB, i think this is default because i didnt change anything with the startup parameters or any jvm options.
I checked many directorys about their depth: one has 12 files, one has ~100 files, so there should be no problem. The biggest one had about 350 files.

Rather than just giving us a petulant "I don't want to"

Would you give a hello world 50MB ram without analyze why its giving an OOME? And if it throws an OOME would you simply rise the heap? I wouldnt. Thats the reason why i ask what could lead to that behavior, because this is a function i (maybe) often need and so i want to know whats happening or if i have to pay attention to something.

The interesting thing is that i changed only one line of code yesterday:

and now its working.

Can someone explain me why?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3010
    
  10
I guess that the "one line" is where you added "files.clear()"? Come on, speak clearly. It's not our job to figure out what you're talking about in order to help you.

What happens if you move the declaration "ArrayList files;" inside the loop? Because really, no code outside the loop has any reason to care about this variable.

Assuming this makes a difference, the problem is that you are close to a threshold where you simply NEED MORE MEMORY. The reason why some code works, and some code doesn't, is that the first "some code" is slightly more efficient, and the second "some code" is slightly less efficient. In this case, "slightly less efficient" means that you are wasting memory by declaring "files" in a larger scope than necessary. You can mostly fix this by calling files.clear(), but a better solution is to declare the variable "files" one level deeper than you have. Like this:

By the way, have you heard of the enhanced for loop available since JDK 5? It's pretty cool, and simplifies the code you just gave us. Check it out.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3010
    
  10
olze oli wrote:Would you give a hello world 50MB ram without analyze why its giving an OOME? And if it throws an OOME would you simply rise the heap? I wouldnt. Thats the reason why i ask what could lead to that behavior, because this is a function i (maybe) often need and so i want to know whats happening or if i have to pay attention to something.

I agree with that sentiment - it's admirable, a good impulse. But "I don't want to" is still a weak reason for anything. If nothing else, raising the heap size is an easy thing to try, to understand where the limits of your program really are. You don't have to abandon the quest to understand why the program is using as much memory as it is - but you can gain valuable information about how much memory it's really using, and whether that amount is just a little more than you expected, or much much more than you expected.

No, I wouldn't give a "hello world" 50 Mb RAM without analyzing why it's giving an OOME. But your program is quite obviously doing much much more than "hello world", and you've given no indication of how many files are really contained (considering children, grandchildren, etc) within your base directory (wherever you're starting from). So thus far, it's not at all surprising that this program might use a lot of memory.
olze oli
Ranch Hand

Joined: Jun 20, 2009
Posts: 148
and you've given no indication of how many files are

I did. As mentioned before, the loop never gets more than ~350 files (iirc. 352). Then it ends and returns the ArrayList. But when it returns the ArrayList the consumption of the heap still slightly rises, thats what i see with JVisualVM, so something is not free'd after that method call.

If the programm gives an OOME, it means that more than 256MB Ram are used and cannot be free'd by GC - i think this is more than enough. Why giving it more? I just dont get that. What should i see then?


Come on, speak clearly.

Sorry, forgot to mention this. Yes, its that line with files.clear().

I will try what you said, declaring the variable inside the loop, and let the pc run through the night and see whats happening... thought i read somewhere not to declare variables inside loops.

have you heard of the enhanced for loop available since JDK 5?

Yes i did. If you take a look at the recursive method i use, you can see that i use it, and the code block doesnt because its just quickly written while eating breakfast (so anyone can see how i use this method).
Paul Sturrock
Bartender

Joined: Apr 14, 2004
Posts: 10336


i think this is more than enough. Why giving it more? I just dont get that

One reason would be if your application needs more.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Method that eats too much heap