This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes Java in General and the fly likes Different character sets on Unix/Windows?  Whats happening? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Different character sets on Unix/Windows?  Whats happening?" Watch "Different character sets on Unix/Windows?  Whats happening?" New topic
Author

Different character sets on Unix/Windows? Whats happening?

Joshua White
Ranch Hand

Joined: Jun 04, 2001
Posts: 97
All,

when reading a text file containing latin characters (such as the � character), I can read the file and display its contents to the console on windows without a problem.

When I do the same on unix, the special characters are replaced by the '?' character. The strange thing is that when I do a "more" on the command, unix displays the special characters correctly.

If it were a character set thing, I would expect unix to display the file incorrectly using the "more" command.

Any idea what is going on here?

Regards,

Joshua
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

You didn't post any code but I expect you are reading the text file without specifying any charset (encoding). And yes, the default charset is different in different environments.

To specify the charset to use, use an InputStreamReader like this:To decide what charset to use, that's a bit more difficult. If you created the file yourself on Windows without specifying a charset (you would use an OutputStreamWriter to do that), then you used the default charset for Windows. On my Windows box that's "cp1252" but in general it's the value of the "file.encoding" Java system property.
Joshua White
Ranch Hand

Joined: Jun 04, 2001
Posts: 97
Using the following:



I have run the above with no luck. I still receive only question marks. Any other ideas?

Regards,

Joshua
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

You read the data into the program. Then you write it out again. You've changed the way you read the file several times with no changes. That suggests to me, at least, that the problem is not with the reading half of the program.

It would be easier just to put some code in your program like Or if you get something else, thenThen that would persuade me whether System.out.print doesn't handle those characters correctly. I know it doesn't on Windows (you get some other character instead) because the DOS code page isn't the same as the file charset, but I'm not familiar with Unix consoles. A console may not be a reliable testing device.

[Edited to have correct syntax]
[ December 14, 2005: Message edited by: Paul Clapham ]
Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

I guess your environment is set up differently for normal operations, and for less:



http://home.arcor.de/hirnstrom/bewerbung
Ajay Reddy
Ranch Hand

Joined: Apr 08, 2005
Posts: 43
I spent the last four days trying to figure this one and finally someone told me about this --

When starting up your JVM specify this option "-Dfile.encoding=ISO-8859-1"
This did the trick for me. I know this is an old post but thought people looking for this thread would find an answer.
Mr. C Lamont Gilbert
Ranch Hand

Joined: Oct 05, 2001
Posts: 1170

Perhaps we can know which Linux? Some are UTF-8 and some are not.
 
 
subject: Different character sets on Unix/Windows? Whats happening?