• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Different character sets on Unix/Windows? Whats happening?

 
Joshua White
Ranch Hand
Posts: 97
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All,

when reading a text file containing latin characters (such as the � character), I can read the file and display its contents to the console on windows without a problem.

When I do the same on unix, the special characters are replaced by the '?' character. The strange thing is that when I do a "more" on the command, unix displays the special characters correctly.

If it were a character set thing, I would expect unix to display the file incorrectly using the "more" command.

Any idea what is going on here?

Regards,

Joshua
 
Paul Clapham
Sheriff
Pie
Posts: 20203
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You didn't post any code but I expect you are reading the text file without specifying any charset (encoding). And yes, the default charset is different in different environments.

To specify the charset to use, use an InputStreamReader like this:To decide what charset to use, that's a bit more difficult. If you created the file yourself on Windows without specifying a charset (you would use an OutputStreamWriter to do that), then you used the default charset for Windows. On my Windows box that's "cp1252" but in general it's the value of the "file.encoding" Java system property.
 
Joshua White
Ranch Hand
Posts: 97
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using the following:



I have run the above with no luck. I still receive only question marks. Any other ideas?

Regards,

Joshua
 
Paul Clapham
Sheriff
Pie
Posts: 20203
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You read the data into the program. Then you write it out again. You've changed the way you read the file several times with no changes. That suggests to me, at least, that the problem is not with the reading half of the program.

It would be easier just to put some code in your program like Or if you get something else, thenThen that would persuade me whether System.out.print doesn't handle those characters correctly. I know it doesn't on Windows (you get some other character instead) because the DOS code page isn't the same as the file charset, but I'm not familiar with Unix consoles. A console may not be a reliable testing device.

[Edited to have correct syntax]
[ December 14, 2005: Message edited by: Paul Clapham ]
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess your environment is set up differently for normal operations, and for less:

 
Ajay Reddy
Ranch Hand
Posts: 43
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I spent the last four days trying to figure this one and finally someone told me about this --

When starting up your JVM specify this option "-Dfile.encoding=ISO-8859-1"
This did the trick for me. I know this is an old post but thought people looking for this thread would find an answer.
 
Mr. C Lamont Gilbert
Ranch Hand
Posts: 1170
Eclipse IDE Hibernate Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Perhaps we can know which Linux? Some are UTF-8 and some are not.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic