File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Soft Skills: The software developer's life manual this week in the Jobs Discussion forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Corrupt file name after compression.

 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using the following code to zip a file with name as shown in attachment. The file name consist of spanish characters. After compression file name is different an some characters are shown as +- as shown in attachment. Can any body tell me how to resolve it?

>
zip pic.JPG
File names
[Thumbnail for zip pic.JPG]
 
Campbell Ritchie
Sheriff
Pie
Posts: 47216
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't know. But it may be worthwhile unzipping the file and seeing whether it is restored correctly. Some characters don't display correctly on screen; Windows seems to be worse for that than other operating systems.
 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have tried unzipping them but it is not restored correctly. I am able to see the correct file name when file is not compressed don't know why it is happening after compression only.
 
Campbell Ritchie
Sheriff
Pie
Posts: 47216
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't know. Sorry.
 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I have got the problem. I have seen the code in ZipOutPutStream It gets byte array of file name and then write the name of the file. I have tried doing the same in the below program. It prints negative value for special characters like é.





Can any one tel me solution how to do this how can I resolve this.
 
Campbell Ritchie
Sheriff
Pie
Posts: 47216
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bytes run from -128 (0x80) to +127 (0x7f). The characters used in Western European languages other than English are in the range 0x80-0xff, so they are regarded by two's complement as negative numbers. You can find the numbers in Unicode (1) and (2). I note some of those characters in no (2) are control characters.

Not sure what you are supposed to do next, but it has something to do with casting to a char, or casting to a char and doing a bitwise AND (&) with 0xff.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure if it's related, but there's a longstanding problem with the java.util.zip package in that it doesn't deal well with non-ASCII filenames. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244499 for more information.
 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
Not sure what you are supposed to do next, but it has something to do with casting to a char, or casting to a char and doing a bitwise AND (&) with 0xff.


Actually java api is doing all this which is related to file name I am not sure how to implement this functionality with rest of the features working same. Kindly suggest me what can be the solution for this thing?

 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read the bug report I linked to? Are you certain that that is not the problem?
 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:Have you read the bug report I linked to? Are you certain that that is not the problem?



thanks for the link Ulf. yes I am facing the same problem but I am very much surprised that after 7 years still that bug exists. That bug was reported in 2001. Any specific reason for not fixing that bug in case You know?
 
pawan chopra
Ranch Hand
Posts: 415
jQuery Mac Objective C
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have executed the following experiment:
- created several text files with English, Hungarian, Chinese, Japanese and
Korean name
- attempted to compress them using FilZip, WinZip and PKZip
- attempted to uncompress then using the above tools
My findings are:
- FilZip and WinZip cannot add files with non-English-only names (not even
Hungarian which uses Latin characters); they cannot list files
- PKZip can add add file with any names, but names are transformed: all
non-Western European accented Latin characters are converted to similar
character without accent (e.g. ű->u, ő->o) and all non-Latin characters are
converted to question marks; NOTE: Accented Western European characters are
preserved (e.g. áéíóöúüñ), thus Spanish is supported
- WinZip cannot list non-Western European file names, but can extract the
files when "Extract all" is selected; but non-Latin characters are replaced
with underscore (_); since all non-Western European Latin characters are
converted to non-accented Western European ones during compression, these files
are listed and extracted but without accents.
- FilZip and PKZip can display and extract all files but with transformation;
see above

Summary: ZIp format does not support Unicode in filenames. It might be possible
to pick one specific code page/character set that would be usable for a
specific language, but it is not know how as tested tools do not provide
control for this.

Solution: No real solution. As workaround, Spanish text should be used with all
accented characters replaced with non-accented relative (ú->u, ó->o, etc.) or
compress files using ISO8859P1 character set for filenames.

Note: PKZip is one of the first zip utilities for Windows; WinZip is the market
leader. If they cannot support Unicode, how could we?
 
Don't get me started about those stupid light bulbs.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic