I have a web service which consumes a TAR archive of binary files and passes them as a List of files to another subsystem for processing. I am using the Apache Commons Compress library (version 1.11) to work with the TAR formatted input.
I am finding that when I use
TarArchiveEntry#getSize to determine the size, allocate storage for the contents, and then use
TarArchiveInputStream#read, that the file may be different than original files included in the uploaded archive. If I just read from the TarArchiveInputStream in chunks through, the resulting files are fine. I noticed that the smallest file (length of 594 bytes) in the archive of 3 was the same with both implementations.
This is my first time working with the library so I am probably missing something. Any ideas or suggestions?
Working Code
Console output:
Contents MD5: df194ba4f2fe114be709c5605839930f (9627051 bytes)
Contents MD5: 3996f04fc6a830520c336825ef5afc1b (508571 bytes)
Contents MD5: 1cf5fca3f6209042fac634f718d30d43 (594 bytes)
Problematic Code
Console output:
Contents MD5: 3ee34d1e3ad7761303107cf9c3a5f6ad (9627051 bytes)
Contents MD5: c5c5dd952977fa6068d717586e57d9a8 (508571 bytes)
Contents MD5: 1cf5fca3f6209042fac634f718d30d43 (594 bytes)