This week's book giveaway is in the Java 8 forum.
We're giving away four copies of Java 8 in Action and have Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft on-line!
See this thread for details.
The moose likes Java in General and the fly likes how to iterate over files in a directory Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "how to iterate over files in a directory" Watch "how to iterate over files in a directory" New topic
Author

how to iterate over files in a directory

Nicole Lacoste
Ranch Hand

Joined: Oct 04, 2006
Posts: 30
Hi All,

I have a directory that contains many files (at least 400,000) who's names I don't know anything about. I want to retrieve these files one by one.

At first I thought I could use one of the java.io.File methods, File[] listFiles() or String [] list(), but the JVM runs out of memory if I try to do that. I have been looking for something like FileIterator but this doesn't see to exist. Then I tried a what seemed at the time a good idea using java.io.File String [] list(FilenameFilter filter) function with a FilenameFilter implementation that looks like this:

class MyFilenameFilter implements FilenameFilter
{
private final long start;
private final long end;
private long current;

MyFileIterator(final long start, final long end)
{
// some checks on start and end
this.start = start;
this.end = end;
this.current = 0;
}

/**
* Picks only files in the interval [start, end)
*/
public final boolean accept(final File dir, final String name)
{
if (current < start)
{
++current;
return false;
}
if (current >= end)
{
// ++current; // don't need to do this
return false;
}
++current;
return true;
}
}

This actually doesn't solve the problem because inside the list(FilenameFilter) method the list() method is called anyway!

Any ideas of how to do this? Is there some FileIterator I just don't know about?


Thanks,

Niki

PS I tried to follow the java.io.File code further but always get to os dependant native methods
[ January 30, 2007: Message edited by: Nicole Lacoste ]
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Haven't tried the following, but how about it?

Open a temporary file. Ask Java to list files in the directory, using your FileFilter. Every time Java calls your FileFilter, append a line containing the leaf name to the temporary file. Return false, so Java returns you no files. Then iterate through the contents of the temporary file.

(Later) No, wait, I didn't read your post properly. This won't work for the same reason you say your attempt didn't work. That is, the filtered version of list() uses the unfiltered version.
[ January 30, 2007: Message edited by: Peter Chase ]

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24168
    
  30

If you're running out of memory, note that the JVM uses a fixed-size heap for Java objects. You can specify a larger size when you start Java with the "-XmxNNm" switch, where NN is some number of megabytes. The default is 64 (most of the time.)

The total amount of memory probably depends strongly on the actual path to the directory. If the path is "C:\SOMETHING\SOMETHING ELSE\WHEREVER\HERE\NOW", then that's about 100 bytes (50 chars times 2 bytes per char) just for the path, plus about 16 bytes for the File object, plus 16 for the String plus 16 for the array in the String, for a minimum of about 150 bytes per File object.

400,000 files times 150 bytes is about 60 MB -- too much to fit in the default Java heap along with everything else. But it would fit nicely in a 100MB heap. Try -Xmx128m see if that helps. Try doubling that again as a last-ditch effort in case the File objects contain other members I'm not thinking of.

Otherwise...

Does it have to be portable? Can you do it in an OS-dependent way? If you can, then you could consider using Runtime.exec() to run "dir" (windows) or "ls" (UNIX) and saving the result in a file, then reading the file line-by-line to get the list of files.


[Jess in Action][AskingGoodQuestions]
Chris Beckey
Ranch Hand

Joined: Jun 09, 2006
Posts: 116

A couple of suggestions, none exactly elegant:
1.) I'm assuming that you have bumped up heap (-xms and -xmx VM args)?
2.) Do all the file have to go in one directory ? Could you hash the names and put hem in subdirectories? There are filesystems that don't deal with huge directories well, or at all.
3.) Spawn a process that sends the directory contents to another file and then parse thet file for the names (ugly and platform specific but would work).
Nicole Lacoste
Ranch Hand

Joined: Oct 04, 2006
Posts: 30
Thanks for the great replies.

I have increased my memory, plus calling .list() and not .listFiles() saevs memory, so for now it is working.

Using a seperate process would make it more robust as my directory size grows. I will do that.

Thanks again,

Niki
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: how to iterate over files in a directory
 
Similar Threads
Java File IO
Wildcards....
Strange StackOverflowError
FilenameFilter help plz
help moderator or site admin