GeeCON Prague 2014*
The moose likes I/O and Streams and the fly likes Zip file archive comment with extended ASCII characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Zip file archive comment with extended ASCII characters" Watch "Zip file archive comment with extended ASCII characters" New topic
Author

Zip file archive comment with extended ASCII characters

Ivan Bell
Greenhorn

Joined: Mar 11, 2011
Posts: 3

I am trying to figure out how to write the registered trademark '®' and copyright character '©' to the archive comment for a zip file. This is not a comment for the ZipEntry, although the solution may be similar; but, rather, the whole zip file comment.

I have tried a lot of different things; but, at the end of the day, the setComment() method on the JarOutputStream (which extends ZipOutputStream) writes a "circumflex a" (i.e., an 'Â') before the extended ASCII characters.

So, instead of:

MySoftware®
Copyright © 2011


I get:

MySoftware®
Copyright © 2011


when viewing the archive comments using WinZip or PKZIP or 7-ZIP or any other archive tool I have tried.

I have tried converting to Unicode; but, since the setComment() implementation only writes single bytes, I get a literal '\u00A9' string in the comment.

Does anyone have a solution for this? Or know how to write the comments to a zip file without using the setComment() method (appending the comment directly to the end of the file)? I have tried the latter, but I am somehow corrupting the archive when doing so.

I know that I could simply use '(R)' and '(c)' instead, but I would rather use the extended ASCII characters, as they look better. I also know that this can be done via WinZip's command line utility; but I would like to use Java so I don't have to buy a zip license just to add an archive comment to a jar file.

Thank you in advance for any help you can offer.


Ivan
"Up the Irons!"
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42047
    
  64
The java.util.zip package is not particularly Unicode-savvy (see this old bug, due to be fixed in Java 7). Unless you can wait for that, check out the Apache Commons Compress library (which includes a UnicodeCommentExtraField class that looks promising in this context).


Ping & DNS - my free Android networking tools app
Ivan Bell
Greenhorn

Joined: Mar 11, 2011
Posts: 3

Yeah, I saw that upcoming Charset arg in the JarFile constructor; however, the majority of my customers are still on 1.5 and it is unlikely that they will shift to 1.7 anytime soon.

I am currently investigating how to manually write/replace the comment in the zip file (without corrupting it). This is probably the only way to fix it without patching a whole lot of java.util.jar and java.util.zip classes.

I will post the solution when I figure it out. Anyone else who already knows who to manually add/replace an archive comment in the zip file can post it first
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42047
    
  64
Anyone else who already knows who to manually add/replace an archive comment in the zip file can post it first

The Commons Compress library is no good?
Ivan Bell
Greenhorn

Joined: Mar 11, 2011
Posts: 3

The Commons distro is fine; however, I wanted to find the solution for my own edification and as an intellectual exercise.

The solution is really very simple (hindsight, of course). After studying the Zip file format, I found that the comment length and content are appended to the end of the zip file. You must first find the "end" of the zip file entries and/directory has a "magic" byte sequence of "0x50, 0x4b, 0x05, 0x06". By finding this byte sequence in the zip file, you can read/write the comment. If there is no comment, you can simply append the comment length and comment string to the end of the zip file.

There is one small caveat to the length that was causing my corruption error. The comment length is written as a two-byte little Endian sequence. So, you need to write the length as such:

byte 1: comment length % 256
byte 2: comment length / 256

E.g.,



When the length is restored (or, if you are trying to read it), you will reconstruct as:




After these two bytes are written, you simply write out your Unicode-encoded string to the end of the file and close it.

This worked perfectly for me. Hope it helps anyone else out there that may have been curious.
 
GeeCON Prague 2014
 
subject: Zip file archive comment with extended ASCII characters