aspose file tools*
The moose likes I/O and Streams and the fly likes International Characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "International Characters" Watch "International Characters" New topic
Author

International Characters

Kasi Viswan
Ranch Hand

Joined: Sep 27, 2008
Posts: 42
I have a java application that processes a lot of text, Text representing data from many countries and languages.

I have to read in the data, process (web service calls) and write out the logs.

I changed the encoding-scheme while reading streams from default to use UTF-8 to support Chinese characters and it worked fine.

When i was using the default system encoding-scheme, the application supported charactes in German Language but when i made the change to use UTF-8, my application no longer supports German characters. It shows as ? and so on.

I can make the change to default again and process German characters but is there not a way to read a file, get its encoding format and configure the input stream reader to use this encoding format to read and configure log4j to use this format to write them out again.

Any pointers to the right direction is much appreciated.

Thanks
Kasi
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42611
    
  65
Just to be clear, with the default encoding the code works with German text, and with UTF-8 it works with Chinese text? That sounds as if the code is not properly processing inputs that come in various encodings. You should never rely on the default encoding - the code should always be aware of what encoding any input is in, and act accordingly.

There is no easy way to determine the encoding of a file, but you can try http://jchardet.sourceforge.net/


Ping & DNS - my free Android networking tools app
Kasi Viswan
Ranch Hand

Joined: Sep 27, 2008
Posts: 42
Is it safe to assume all input files are UTF-8.

I converted the ger file to UTF-8 encoding with Notepad++ and my application works now, so it supports ger and chi languages with UTF-8.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: International Characters