This week's book giveaway is in the OO, Patterns, UML and Refactoring forum.
We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line!
See this thread for details.
The moose likes I/O and Streams and the fly likes file name encoding problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "file name encoding problem" Watch "file name encoding problem" New topic

file name encoding problem

Kevin Ton

Joined: Mar 14, 2008
Posts: 1
What is filename encoding when you create a file in below way:
String filename = "name including chinese charaters"
File file = new File("filename");

In java class, the filename string is a string including chinese characters.
I think the filename encoding maybe affected by the os chartset. But when the file.encoding is CP-1252 and run the class , then create a file and the filename is well without garbled characters.

So my question is which factor will affect the filename encoding?

Greg Charles

Joined: Oct 01, 2001
Posts: 2931

Hi Kevin,

Welcome to Java Ranch!

I'm confused by a couple of points in your question. First, the file encoding being CP-1252. What file is encoded that way? If it's the Java source file, then I don't even think you could save the code that contains Chinese characters, but I could be wrong. If it's the encoding on the file you are creating, that would affect the contents of the file, not its name. Are you ever seeing garbled characters? If so, where? In a command line directory listing? In a graphical file explorer? In an IDE?

For what it's worth, I tried to put 恭贺新禧 into my Java source file inside the Eclipse IDE. In order to save the file, I had to change the file properties to set the encoding to UTF-16 or UTF-8 instead of Cp1252. In order to see the characters display in the Eclipse window, I had to change the font for the Java editor to Arial Unicode MS (I'm on Windows at the moment) and the Script to Chinese-GB2312.

I've never really understood that Script setting and how it relates to Unicode. It seems to me if I have a character code for 恭 (606D), and the font has a character matching that code, it should display it. Why do I have to tell it what script to use? Maybe someone can answer that for both of us!
Stephan van Hulst

Joined: Sep 20, 2010
Posts: 3989

Greg, I think Kevin was wondering what character set the OS uses to store file names in the file system tables, and if it can be influenced in any way by the user.

I don't have an answer I'm afraid though.

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.
Greg Charles

Joined: Oct 01, 2001
Posts: 2931

I believe all modern operating systems have the ability to store foreign characters in the file name. They might not always be able to display them in all cases though. For example, in Windows you might have to enable Asian Language Support before the system fonts used in the file explorer or the command window would be updated to show you Chinese character file names. That's why I asked Kevin where he's seeing garbled characters. Just to be sure though Kevin, what OS are you using?
Paul Clapham

Joined: Oct 14, 2005
Posts: 19728

It's also possible that Kevin is asking about what happens when you have Java source code containing non-ASCII characters. Presumably that source code, which is a text file, should be interpreted by the compiler as a text file in a certain encoding. If that encoding doesn't match the encoding which the editor was using when it created the file, then yes, problems are going to arise.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link:
subject: file name encoding problem
It's not a secret anymore!