• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Different character sets on Unix/Windows? Whats happening?

 
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All,

when reading a text file containing latin characters (such as the � character), I can read the file and display its contents to the console on windows without a problem.

When I do the same on unix, the special characters are replaced by the '?' character. The strange thing is that when I do a "more" on the command, unix displays the special characters correctly.

If it were a character set thing, I would expect unix to display the file incorrectly using the "more" command.

Any idea what is going on here?

Regards,

Joshua
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You didn't post any code but I expect you are reading the text file without specifying any charset (encoding). And yes, the default charset is different in different environments.

To specify the charset to use, use an InputStreamReader like this:To decide what charset to use, that's a bit more difficult. If you created the file yourself on Windows without specifying a charset (you would use an OutputStreamWriter to do that), then you used the default charset for Windows. On my Windows box that's "cp1252" but in general it's the value of the "file.encoding" Java system property.
 
Joshua White
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Using the following:



I have run the above with no luck. I still receive only question marks. Any other ideas?

Regards,

Joshua
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You read the data into the program. Then you write it out again. You've changed the way you read the file several times with no changes. That suggests to me, at least, that the problem is not with the reading half of the program.

It would be easier just to put some code in your program like Or if you get something else, thenThen that would persuade me whether System.out.print doesn't handle those characters correctly. I know it doesn't on Windows (you get some other character instead) because the DOS code page isn't the same as the file charset, but I'm not familiar with Unix consoles. A console may not be a reliable testing device.

[Edited to have correct syntax]
[ December 14, 2005: Message edited by: Paul Clapham ]
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guess your environment is set up differently for normal operations, and for less:

 
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I spent the last four days trying to figure this one and finally someone told me about this --

When starting up your JVM specify this option "-Dfile.encoding=ISO-8859-1"
This did the trick for me. I know this is an old post but thought people looking for this thread would find an answer.
 
Ranch Hand
Posts: 1170
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Perhaps we can know which Linux? Some are UTF-8 and some are not.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic