aspose file tools*
The moose likes Java in General and the fly likes Corrupt file name after compression. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Corrupt file name after compression." Watch "Corrupt file name after compression." New topic
Author

Corrupt file name after compression.

pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

I am using the following code to zip a file with name as shown in attachment. The file name consist of spanish characters. After compression file name is different an some characters are shown as +- as shown in attachment. Can any body tell me how to resolve it?

>



[Thumbnail for zip pic.JPG]



Pawan Chopra
SCJP - DuMmIeS mInD
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38398
    
  23
Don't know. But it may be worthwhile unzipping the file and seeing whether it is restored correctly. Some characters don't display correctly on screen; Windows seems to be worse for that than other operating systems.
pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

I have tried unzipping them but it is not restored correctly. I am able to see the correct file name when file is not compressed don't know why it is happening after compression only.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38398
    
  23
Don't know. Sorry.
pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

I think I have got the problem. I have seen the code in ZipOutPutStream It gets byte array of file name and then write the name of the file. I have tried doing the same in the below program. It prints negative value for special characters like é.





Can any one tel me solution how to do this how can I resolve this.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38398
    
  23
Bytes run from -128 (0x80) to +127 (0x7f). The characters used in Western European languages other than English are in the range 0x80-0xff, so they are regarded by two's complement as negative numbers. You can find the numbers in Unicode (1) and (2). I note some of those characters in no (2) are control characters.

Not sure what you are supposed to do next, but it has something to do with casting to a char, or casting to a char and doing a bitwise AND (&) with 0xff.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41572
    
  54
Not sure if it's related, but there's a longstanding problem with the java.util.zip package in that it doesn't deal well with non-ASCII filenames. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244499 for more information.


Ping & DNS - my free Android networking tools app
pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

Campbell Ritchie wrote:
Not sure what you are supposed to do next, but it has something to do with casting to a char, or casting to a char and doing a bitwise AND (&) with 0xff.


Actually java api is doing all this which is related to file name I am not sure how to implement this functionality with rest of the features working same. Kindly suggest me what can be the solution for this thing?

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41572
    
  54
Have you read the bug report I linked to? Are you certain that that is not the problem?
pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

Ulf Dittmer wrote:Have you read the bug report I linked to? Are you certain that that is not the problem?



thanks for the link Ulf. yes I am facing the same problem but I am very much surprised that after 7 years still that bug exists. That bug was reported in 2001. Any specific reason for not fixing that bug in case You know?
pawan chopra
Ranch Hand

Joined: Jan 23, 2008
Posts: 410

I have executed the following experiment:
- created several text files with English, Hungarian, Chinese, Japanese and
Korean name
- attempted to compress them using FilZip, WinZip and PKZip
- attempted to uncompress then using the above tools
My findings are:
- FilZip and WinZip cannot add files with non-English-only names (not even
Hungarian which uses Latin characters); they cannot list files
- PKZip can add add file with any names, but names are transformed: all
non-Western European accented Latin characters are converted to similar
character without accent (e.g. ű->u, ő->o) and all non-Latin characters are
converted to question marks; NOTE: Accented Western European characters are
preserved (e.g. áéíóöúüñ), thus Spanish is supported
- WinZip cannot list non-Western European file names, but can extract the
files when "Extract all" is selected; but non-Latin characters are replaced
with underscore (_); since all non-Western European Latin characters are
converted to non-accented Western European ones during compression, these files
are listed and extracted but without accents.
- FilZip and PKZip can display and extract all files but with transformation;
see above

Summary: ZIp format does not support Unicode in filenames. It might be possible
to pick one specific code page/character set that would be usable for a
specific language, but it is not know how as tested tools do not provide
control for this.

Solution: No real solution. As workaround, Spanish text should be used with all
accented characters replaced with non-accented relative (ú->u, ó->o, etc.) or
compress files using ISO8859P1 character set for filenames.

Note: PKZip is one of the first zip utilities for Windows; WinZip is the market
leader. If they cannot support Unicode, how could we?
 
wood burning stoves
 
subject: Corrupt file name after compression.