File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes I/O and Streams and the fly likes Checking a file has been split properly Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Checking a file has been split properly" Watch "Checking a file has been split properly" New topic

Checking a file has been split properly

d jones
Ranch Hand

Joined: Mar 13, 2006
Posts: 76

I am writing a program which will split a file into multiple smaller files. Following this I then need to make sure that all of the data is in the split files.

Following the splitting operation I was thinking about counting the number of lines in the master file and then counting the number of lines in each of the smaller files. The sum of the lines in the smaller files should be the same as the number of lines in the master file. I'm thinking that this might take a long time and wondering if there is any quicker way of doing it.

Another possible way is the File.length method?? Should the sum of the lengths of the smaller files be the same as the length of the master file?

Would appreciate your thoughts and ideas on this.

Many Thanks
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Sounds like your files are all text? A line count audit should add up ok. A byte count might not since you're losing one line terminator per file (except the last file) in the split, and you might change the line break from CR to CRLF or vice versa.

Probably the best way to make sure you didn't lose or break any data would be to put the files back together and see if the new merged file matches the original, maybe with a 3rd party compare tool.

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
Hm, Stan seems to be making several assumptions about how line terminators are being processed here, and I'm not sure they're warranted. I think the poster needs to determine whether things like line terminators will be changed during processing, and whether a new line terminator may be added at the end of each split file. Offhand I don't think either of those is necessary, though they may be desirable, or not, depending what this application is for.

If you're concerned about time, then counting lines in a file does ultimately require that you read each and every byte of the file to discover if it's a line terminator (or part of one). If you're going to do that, you might as well also use some sort of checksum to verify that all the data is valid, not just the number of lines. The time spend calculating a checksum should be small compared to the time spend reading from the file in the first place. If you want something less reliable but much faster, and if you're not changing line terminators (something I typically find unnecessary or even undesirable anyway) then simply adding up the total file sizes should work reasonably well.

"I'm not back." - Bill Harding, Twister
I agree. Here's the link:
subject: Checking a file has been split properly
jQuery in Action, 3rd edition