aspose file tools*
The moose likes Java in General and the fly likes Access the entry such as entry filename in gzip file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Access the entry such as entry filename in gzip file" Watch "Access the entry such as entry filename in gzip file" New topic
Author

Access the entry such as entry filename in gzip file

Ken Kirin
Greenhorn

Joined: Dec 02, 2004
Posts: 26
Hi all,

Have anyone known how to access the entry in gzip file such as the file name just like zip api from java.util.zip does ( by calling getEntry() of ZIPInputStream and then entry.getName() of ZipEntry )

The gzip file I have is created from GZIPOutputStream.

Thanks!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

GZIP compression is very, very different from ZIP compression.

Unlike ZIP, GZIP can only store one single file. The name of the file is usually the same as the GZIP file minus the .gz
This is also why TAR is so popular in combination with GZIP - to pack multiple files (and folders) in one file that can then be GZIPPED.

If you have a file called "myfile.tar.gz", unzipping it with gunzip will create file "myfile.tar". If you rename the .gz file the resulting will also be named differently.


Now if you have not used the same naming approach when creating the GZIP file then there is no way at all to retrieve the original way.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Ken Kirin
Greenhorn

Joined: Dec 02, 2004
Posts: 26
Rob Prime,

The gzip file I mentioned is not created using tar utility. For example, RDC2008081300085624.DAT.gz

As my understanding if the gz file is created by Tar API you can jump into this gzip file and validate if the entry name in this gzip file is valid or not.

My intention is actually would like to find some ways to validate the entry in the gzip file ( Not created from Tar utility, if it is created from Tar utility I can use, for example, getTarEntry() from the available API ) whether the entry file name is following the validation rule or not. If not, I can just simply reject that gzip file and not allow it to be processed further.

My guess is that there is no way that we can jump into that type of gzip file and get the entry information.

Cheers!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

The closest thing you can get out of the file is "RDC2008081300085624.DAT". Seeing that it has a timestamp in it (2008-08-13 00:08:58, don't know what the 24 is) you could get it back to RDC.DAT but that's about it.

And TAR was just an example because that's where GZIP is used the most for (at least in the Unix / Linux world). You can use GZIP to compress any single file.
[ September 18, 2008: Message edited by: Rob Prime ]
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
You are wrong.

- gzip DOES store the filename
- gzip DOES allow more files per archive

Read the man page at least.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

You are right about the name; if you use "gunzip -N" you can get the original file name back, unless the file was zipped using "gzip -n". (Guess I learned something new today )
That's not default behaviour though (gunzip -n and gzip -N are defaults), and not supported with the java.util.zip classes.

gzip does NOT support multiple files though; when you pass multiple files as arguments, it will convert each of them into their own .gz file. I read the entire man file, and found nothing about multiple files per archive; only the behaviour I just described.
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
From wikipedia:

"the gzip format allows one .gz file to contain multiple compressed files"
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

From both experience and the man pages: nonsense.

And your Wikipedia quote is either older or you misinterpretted it. What I found about multiple files:
Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files. Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip. The final .tar.gz or .tgz file is usually called a tarball.

So yes, according to this quote it is possible to compress multiple files. But in the end, it will turn up as one huge file with all separate file contents chained. So all in all, you can still get one single file from it. You would have to separate it yourself to get the original multiple files back.
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
It stores them separately.
It is very clearly described there (wikipedia).

No, the GNU gzip v1.3.12 program does not give any easy way to decode each separately. But the format allows and supports it.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

Can you post the Wikipedia URL then? Because http://en.wikipedia.org/wiki/Gzip says nothing about multiple files except what I have quoted before.
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
But that's it.

"Although its file format also allows for multiple such streams[/B] to be concatenated."
The "streams" are "files". They have a length, name and body (file content).

See also RFC 1952. The stream is called "member" there.

It stores one complete compressed file. And the GZ file can have more such streams/members.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

But it also says "zipped files are simply decompressed concatenated as if they were originally one file". So adding multiple files is no problem. Getting them back as multiple files is, because you only get one file back.
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
That depends on the tool used for extraction. I believe the original poster wants to use some Java code and not the GNU zip utility.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

Hmm, we are getting quite offtopic

Conclusion: it's not possible with Java. You can only retrieve the stored contents as one large stream which would have to be separated by the programmer himself.
David Balažic
Ranch Hand

Joined: May 15, 2008
Posts: 86
It is not possible with the current version of java.util.zip.GZIPInputStream.

It is quite possible in Java (by writing own code to do it).
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19544
    
  16

Originally posted by David Balažic:
It is not possible with the current version of java.util.zip.GZIPInputStream.

That's what I meant. Thanks for the correction
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Access the entry such as entry filename in gzip file
 
Similar Threads
Which are the different between GZIPInputStream to ZipInputStream?
java.io.IOException: Corrupt GZIP trailer
how to access jsp under WEB-INF folder
GZIP file append writing
Using gzip in jsp files