This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
GZIP compression is very, very different from ZIP compression.
Unlike ZIP, GZIP can only store one single file. The name of the file is usually the same as the GZIP file minus the .gz This is also why TAR is so popular in combination with GZIP - to pack multiple files (and folders) in one file that can then be GZIPPED.
If you have a file called "myfile.tar.gz", unzipping it with gunzip will create file "myfile.tar". If you rename the .gz file the resulting will also be named differently.
Now if you have not used the same naming approach when creating the GZIP file then there is no way at all to retrieve the original way.
The gzip file I mentioned is not created using tar utility. For example, RDC2008081300085624.DAT.gz
As my understanding if the gz file is created by Tar API you can jump into this gzip file and validate if the entry name in this gzip file is valid or not.
My intention is actually would like to find some ways to validate the entry in the gzip file ( Not created from Tar utility, if it is created from Tar utility I can use, for example, getTarEntry() from the available API ) whether the entry file name is following the validation rule or not. If not, I can just simply reject that gzip file and not allow it to be processed further.
My guess is that there is no way that we can jump into that type of gzip file and get the entry information.
The closest thing you can get out of the file is "RDC2008081300085624.DAT". Seeing that it has a timestamp in it (2008-08-13 00:08:58, don't know what the 24 is) you could get it back to RDC.DAT but that's about it.
And TAR was just an example because that's where GZIP is used the most for (at least in the Unix / Linux world). You can use GZIP to compress any single file. [ September 18, 2008: Message edited by: Rob Prime ]
You are right about the name; if you use "gunzip -N" you can get the original file name back, unless the file was zipped using "gzip -n". (Guess I learned something new today ) That's not default behaviour though (gunzip -n and gzip -N are defaults), and not supported with the java.util.zip classes.
gzip does NOT support multiple files though; when you pass multiple files as arguments, it will convert each of them into their own .gz file. I read the entire man file, and found nothing about multiple files per archive; only the behaviour I just described.
Joined: May 15, 2008
"the gzip format allows one .gz file to contain multiple compressed files"
And your Wikipedia quote is either older or you misinterpretted it. What I found about multiple files:
Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files. Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip. The final .tar.gz or .tgz file is usually called a tarball.
So yes, according to this quote it is possible to compress multiple files. But in the end, it will turn up as one huge file with all separate file contents chained. So all in all, you can still get one single file from it. You would have to separate it yourself to get the original multiple files back.
Joined: May 15, 2008
It stores them separately. It is very clearly described there (wikipedia).
No, the GNU gzip v1.3.12 program does not give any easy way to decode each separately. But the format allows and supports it.
But it also says "zipped files are simply decompressed concatenated as if they were originally one file". So adding multiple files is no problem. Getting them back as multiple files is, because you only get one file back.
Joined: May 15, 2008
That depends on the tool used for extraction. I believe the original poster wants to use some Java code and not the GNU zip utility.