• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Download CSV file header is gets junk when Japanese language selected in Linux server

 
Greenhorn
Posts: 4
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When we download CSV file from Linux server using java code header value is gets junk for Japanese language.
It is working fine when run on window platform.
Following is my sample code.



Here, label 1, label 2 etc are in two language(English,Japanese). When we select English then working fine in both environment(Linux,Window).
But when we select Japanese it is working fine in window only, not working in Linux.
What is wrong in the code or is there any other way solve this issue?
 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Linux and Windows use different character encoding systems by default. (https://superuser.com/questions/294219/what-are-the-differences-between-linux-and-windows-txt-files-unicode-encoding)

In short, Linux usually defaults to using UTF-8 encoding, which is fine for latin scripts, but can't represent Japanese (because 8 bits isn't enough to represent all the characters in the language). The reason you get giberrish, is that at some point when it takes the larger unit data and converts it to UTF-8 (or some other smaller encoding type) automatically, it is simply discarding all the information in the extra bits.

This could be an issue with the program you are using to display data on Linux (in which case the code is fine). Have you tried copying the data file you generate on the Linux system and trying to display it on a Windows machine? If this turns out to be the problem, you need to change the settings of whatever program you are displaying data with on Linux.

Otherwise, if it is a code issue, there is likely to be a setting somewhere in CSVConfig (I've never used that library, so you would have to check the documentation) to make sure it is using UTF-16 Unicode (rather than the system default). You would need to make sure the program is using the correct (i.e. larger) encoding system so that it can handle Japanese characters.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jimmy Robov wrote:In short, Linux usually defaults to using UTF-8 encoding, which is fine for latin scripts, but can't represent Japanese (because 8 bits isn't enough to represent all the characters in the language). The reason you get giberrish, is that at some point when it takes the larger unit data and converts it to UTF-8 (or some other smaller encoding type) automatically, it is simply discarding all the information in the extra bits.



Actually UTF-8 is specifically designed so that it can encode all scripts, including the CJK scripts which Japanese uses. The actual reason that you don't see the correct data is that something is making an incorrect assumption about the encoding being used by the download and hence using the wrong encoding to write the file.

So your code coincidentally happens to work correctly with the Windows server because the DownloadUtils.download() method copies the data in the correct encoding. Possibly that's because the Windows server uses the encoding which you expect the output file to have. My approach would be, for a start, to ensure that the Windows and Unix servers use the same encoding for the download you are using.

I assume that the client code is always running on the same machine, and it's only the servers which run on different operating systems. If the client code works differently on Windows and Unix machines as well, that's another possible source of problems.
 
reply
    Bookmark Topic Watch Topic
  • New Topic