• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Zip file archive comment with extended ASCII characters

 
Greenhorn
Posts: 3
Firefox Browser VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am trying to figure out how to write the registered trademark '®' and copyright character '©' to the archive comment for a zip file. This is not a comment for the ZipEntry, although the solution may be similar; but, rather, the whole zip file comment.

I have tried a lot of different things; but, at the end of the day, the setComment() method on the JarOutputStream (which extends ZipOutputStream) writes a "circumflex a" (i.e., an 'Â') before the extended ASCII characters.

So, instead of:

MySoftware®
Copyright © 2011


I get:

MySoftware®
Copyright © 2011


when viewing the archive comments using WinZip or PKZIP or 7-ZIP or any other archive tool I have tried.

I have tried converting to Unicode; but, since the setComment() implementation only writes single bytes, I get a literal '\u00A9' string in the comment.

Does anyone have a solution for this? Or know how to write the comments to a zip file without using the setComment() method (appending the comment directly to the end of the file)? I have tried the latter, but I am somehow corrupting the archive when doing so.

I know that I could simply use '(R)' and '(c)' instead, but I would rather use the extended ASCII characters, as they look better. I also know that this can be done via WinZip's command line utility; but I would like to use Java so I don't have to buy a zip license just to add an archive comment to a jar file.

Thank you in advance for any help you can offer.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The java.util.zip package is not particularly Unicode-savvy (see this old bug, due to be fixed in Java 7). Unless you can wait for that, check out the Apache Commons Compress library (which includes a UnicodeCommentExtraField class that looks promising in this context).
 
Ivan Bell
Greenhorn
Posts: 3
Firefox Browser VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah, I saw that upcoming Charset arg in the JarFile constructor; however, the majority of my customers are still on 1.5 and it is unlikely that they will shift to 1.7 anytime soon.

I am currently investigating how to manually write/replace the comment in the zip file (without corrupting it). This is probably the only way to fix it without patching a whole lot of java.util.jar and java.util.zip classes.

I will post the solution when I figure it out. Anyone else who already knows who to manually add/replace an archive comment in the zip file can post it first
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anyone else who already knows who to manually add/replace an archive comment in the zip file can post it first


The Commons Compress library is no good?
 
Ivan Bell
Greenhorn
Posts: 3
Firefox Browser VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The Commons distro is fine; however, I wanted to find the solution for my own edification and as an intellectual exercise.

The solution is really very simple (hindsight, of course). After studying the Zip file format, I found that the comment length and content are appended to the end of the zip file. You must first find the "end" of the zip file entries and/directory has a "magic" byte sequence of "0x50, 0x4b, 0x05, 0x06". By finding this byte sequence in the zip file, you can read/write the comment. If there is no comment, you can simply append the comment length and comment string to the end of the zip file.

There is one small caveat to the length that was causing my corruption error. The comment length is written as a two-byte little Endian sequence. So, you need to write the length as such:

byte 1: comment length % 256
byte 2: comment length / 256

E.g.,



When the length is restored (or, if you are trying to read it), you will reconstruct as:




After these two bytes are written, you simply write out your Unicode-encoded string to the end of the file and close it.

This worked perfectly for me. Hope it helps anyone else out there that may have been curious.
 
Can't .... do .... plaid .... So I did this tiny ad instead:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic