my dog learned polymorphism*
The moose likes I/O and Streams and the fly likes large system.in to object concurrent operations Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "large system.in to object concurrent operations" Watch "large system.in to object concurrent operations" New topic
Author

large system.in to object concurrent operations

BV Boose
Ranch Hand

Joined: Jul 26, 2008
Posts: 33
Here's my problem: on a windows box I need to search for all the locations in any number of drives for a list of files or a single file. The list currently contains 100 files.
My solution was to create a hash of the entire drive using the file name as the key and an arraylist of locations as the value.
Currently I'm exec'ing dir /s/b and processing the input stream all at once, creating a bufferedReader, then running .readLine() on it:


but unless I increase the heap size I get JVM 'out of memory errors'. Ideally I'd like to buffer the inputStream and read from the created buffer concurrently, so I'm emptying it as it's being filled.
I think this would be faster and more efficient then globbing the entire input stream first and then converting it.
So: Can I do this?
If I can, should I do this?
How do I go about doing this?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

BV Boose wrote:
I think this would be faster and more efficient then globbing the entire input stream first and then converting it.
So: Can I do this?
If I can, should I do this?
How do I go about doing this?


Since you are running out of memory, looking for "fast and efficient" code should not be your first question. Instead you want code which doesn't use up all of your memory.

Anyway, you are already reading the input stream one line at a time, so trying to avoid reading the input stream all at once isn't the right question to ask.

As far as I can see you are writing a processed version of every line from that input stream into a hashmap of some kind. If your system is like mine then you are writing several hundred thousand entries into that hashmap. From your original question it seems that you only need to keep information on files which have certain specific names, so yes, I agree, storing all of them probably isn't a good idea.
BV Boose
Ranch Hand

Joined: Jul 26, 2008
Posts: 33
Part of the efficiency would be not eating up all the system resources. I thought that if I added the files to the hash as they're being added to the buffer it would consume less memory and be faster, though I could be wrong.


It's not just information for specific files- I do have an initial list of files, but I have to be able to do searches on the fly. In an early iteration I searched for each individual file name, this was prohibitively slow.

Paul Clapham wrote:
BV Boose wrote:
Since you are running out of memory, looking for "fast and efficient" code should not be your first question. Instead you want code which doesn't use up all of your memory.

Anyway, you are already reading the input stream one line at a time, so trying to avoid reading the input stream all at once isn't the right question to ask.

As far as I can see you are writing a processed version of every line from that input stream into a hashmap of some kind. If your system is like mine then you are writing several hundred thousand entries into that hashmap. From your original question it seems that you only need to keep information on files which have certain specific names, so yes, I agree, storing all of them probably isn't a good idea.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

BV Boose wrote:Part of the efficiency would be not eating up all the system resources. I thought that if I added the files to the hash as they're being added to the buffer it would consume less memory and be faster, though I could be wrong.


That's not even wrong, as you aren't adding anything to any buffers. And adding an entry to a hash takes the same amount of memory no matter what else was happening when you did it.

If you want something less confusing then don't use Runtime.exec() at all. Modify your application to just read a list of file names from the console (System.in) and then run it from the command line like this:


That way you don't have any buffers confusing the issue and you can concentrate on the actual problem.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: large system.in to object concurrent operations