posted 19 years ago
Well, there are a few other choices available, but they're going to be more complex to deal with, and for most applications, probably not worth the effort. (Especialy since you're posting this in Beginner - the alternate technique I'm going to describe is not recommended for a beginner.) I would say they only have a chance of being faster if the number of lines to delete is very small, and / or close to the end of the file. It's almost certainly best to first do this as EFH described, and see how fast it actually is. If it is indeed too slow that way, then explore this alternate technique.
There are three classes I know of which allow you to manipulate file contents directly, without writing a new file. They are:
RandomAccessFile
FileChannel
MappedByteBuffer
Actually all 3 are somewhat related - you can get a FileChannel from a RandomAccessFile, and a MappedByteBuffer from a FileChannel. The MappedByteBuffer is probably fastest for a really big file - but that's not guaranteed, and it may be limited by how much memory you have. So ultimately you may have to try each one to see how fast it really is for you. The techniques for using all 3 will be similar though. If you're not familiar with any of these, start with the RandomAccessFile first.
For all 3 of these classes, there's no existing method to simply delete a range of bytes. You will have to do this by reading a range of bytes into a buffer, then copying them to a new location. Which is much like the technique EFH described, but here you have a much greater chance of screwing something up. So again, this is only really worthwhile (maybe) if there's a substantial portion of the beginning of the file which you don't need to move at all.
Let's say you have a file 10000 bytes long, and you need to delete the following regions:
8000-8100
8500-8600
9000-9100
9500-9600
First, you don't need to touch anything from 0-8000. Great. Now how to delete 8000-8100? You could take all the bytes from 8100-10000 and copy them forward to 8000-9900. However this would be inefficient, since you're going to have to move many of the later bytes again to delete subsequent lines. Instead, you'd want to do something like this:
Copy 8100-8500 to 8000-8400.
Copy 8600-9000 to 8400-8800.
Copy 9100-9500 to 8800-9200.
Copy 9600-10000 to 9200-9600.
Set file length to 9600, deleting any remaining bytes.
If you're comfortable figuring out how to code something like this, then this technique may be worth trying. Note that if you ever want to insert lines, or even edit lines in a way that might increase their length, then this won't work because you will end up overwriting some bytes before you have a chance to copy them. It may be possible to approch this instaed from the other side, starting at the end of the file. But really, that's going to be a very complex, ugly thing to do. I do not recommend it.
Again, I strongly recommend trying this the way EFH recommended first. It will definitly be easier, and it's much more flexible if you want to do other things like insert or edit lines.
Good luck...
"I'm not back." - Bill Harding, Twister