*
The moose likes Java in General and the fly likes What's Best/Fastest way to read random parts of a file? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "What Watch "What New topic
Author

What's Best/Fastest way to read random parts of a file?

Robert Paris
Ranch Hand

Joined: Jul 28, 2002
Posts: 585
I want to read parts of a file (e.g. position 375-392, or 4,586-4,599). What's the best way to do this? I assume I don't want to load the whole thing into an array via InputStream, right? What if I'll be eventually reading the entire file contents by the time the processing is done but I'll be jumping all over the file? Does that make a difference?
Chris Harris
Ranch Hand

Joined: Sep 21, 2003
Posts: 231
Hi Robert,
I would say the fastest way to do this would be to create a RandomAccessFile and get the Channel (from nio) to read and write the file.
The reading and writing of files have be massively improved by nio. In my expereance without nio RandomAccessFile is very slow.
The idea of reading in the whole file may be quicker depending on the size of the file and the percentage of the file you want to proccess. If it is only a small file, it may be quicker to just read the whole thing in.
Chris


SCJP 1.2, SCWCD, SCBCD
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[Chris]: In my expereance without nio RandomAccessFile is very slow.
My experience matches this, for JDK 1.2 and 1.3. From this I developed a habit of avoiding RAF like the plague; I could always find a faster solution using streams. Even for random access, you can create a new FileInputStream and use skip() to get to the spot you want, and read from there - and I found this to be faster than using the evil RAF. Even if I had to create a new FIS for each separate read (since you can't move backwards with a skip()).
However recently I broke my own rules and tried using RAF again. It seems that the RAF in 1.4 is comparable to using a FileChannel - it's much, much faster than it used to be. In some cases it may even be faster than FileChannel, but that's probably because it's not always obvious how to get the best performance from FileChannel, and there's a bit of a learning curve with NIO. So for JDK 1.4+, I'd say there are three basic options:
  • Just use RAF.
  • Use FileChannel's seek(long) and read(ByteBuffer) methods.
  • Use FileChannel's map() to get a MappedByteBuffer of the whole file.

  • The last takes more overhead - it doesn't make sense for a short file you're only accessing a few times, but if it's a big file, and/or if you're going to be accessing it a lot, it's the best way to go. Between the other two options, I'm not sure which is really preferable; test and find out. RAF is probably simpler, unless you want to use some of the other NIO-specific methods or classes. E.g. FileChannel's transferTo() and transferFrom() are pretty slick if you've got other channels to interact with. And a Selector is great for running an efficient server, which then encourages you to use channels and buffers throughout the system. But if you aren't needing other NIO features like that, RAF is probably fine, nowadays.
    But if you do have the misfortune to be using a JDK < 1.4, just stick to FileInpoutStream and skip(). RAF sux.


    "I'm not back." - Bill Harding, Twister
    Robert Paris
    Ranch Hand

    Joined: Jul 28, 2002
    Posts: 585
    Thanks for the replies guys! Those helped a lot. We were using 1.3 but luckily switched to 1.4 recently, so that shouldn't be a problem. I think the best option is FileChannel's map(). BTW, what do you consider a big file? 1 meg+, 2 meg+? More?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    what do you consider a big file?
    I don't really know. The API for map() seems to indicate that something like 10-30 kB is still "small" for most systems. I'll guess that once you're in the MB range it's considered "big". But really, that's just a guess; I've done almost no direct comparison here, and the results probably vary a lot by machine anyway.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: What's Best/Fastest way to read random parts of a file?