File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Beginning Java and the fly likes Problems with non-english characters Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Problems with non-english characters" Watch "Problems with non-english characters" New topic

Problems with non-english characters

Unnar Björnsson
Ranch Hand

Joined: Apr 30, 2005
Posts: 164
Im trying to get my application to print the contents of a folder using cmd.exe. It works with one exception. Im icelandic and some of my folder names contain icelandic characters like '�' '�' '�' '�'...
The application initilizes the String currentDir with this method:

currentDir holds the path which should be applied after the command "dir" in cmd.exe so when I type "ls" (which my shells command for "dir") it shows the content of the folder located in currentDir.
In my case the currentDir is: C:\Documents and Settings\Torquemada\My Documents\Javaskr�r\Unnar\Verkefni 1 st�rikerfi which includes 2 icelandic characters '�' and '�' so when I execute "ls" nothing happents.
I made new string: testString = "C:\\Documents and Settings\\Torquemada\\My Documents\\Javaskr�r\\Unnar\\Verkefni 1 st�rikerfi" and executed "ls" with testString as argument instead of currentDir and everything was fine. I even printed both the strings with System.out.println() and got this:

C:\Documents and Settings\Torquemada\My Documents\Javaskr�r\Unnar\Verkefni 1 sty
C:\Documents and Settings\Torquemada\My Documents\Javaskr�r\Unnar\Verkefni 1 st�
Equal? - false

The lower string is the one that works but the upper one that displays the string correctly doesn�t.

How do I fix it?
[ February 04, 2006: Message edited by: Unnar Bj�rnsson ]
Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

Oh, I see. It took me a while to understand. You are executing the "cd" command and reading its output so you can get the current working directory. When you do that, you use something that doesn't use the system's default charset as Java understands it, and then you convert its output using the system's default charset. So any non-ASCII characters are converted using one scheme and then converted back using a different scheme. Hence the errors.

There's probably a way to fix that, but it would be easier to use a less horribly convoluted way of getting the current working directory. Like one ofAnd if you were going to continue on with more Runtime.exec() calls to "dir" or "ls", to find the files in that directory, please look up the methods in the class that allow you to get the files in a directory inside Java without having to use OS-specific hacks like that.
Unnar Björnsson
Ranch Hand

Joined: Apr 30, 2005
Posts: 164
That looks more promising, thanks!
I agree. Here's the link:
subject: Problems with non-english characters
It's not a secret anymore!