aspose file tools*
The moose likes I/O and Streams and the fly likes file name encoding problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "file name encoding problem" Watch "file name encoding problem" New topic
Author

file name encoding problem

Kevin Ton
Greenhorn

Joined: Mar 14, 2008
Posts: 1
What is filename encoding when you create a file in below way:
-----------------------------------------
String filename = "name including chinese charaters"
File file = new File("filename");
file.createNewFile();
-----------------------------------------

In java class, the filename string is a string including chinese characters.
I think the filename encoding maybe affected by the os chartset. But when the file.encoding is CP-1252 and run the class , then create a file and the filename is well without garbled characters.

So my question is which factor will affect the filename encoding?

Thanks,
Kevin
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2850
    
  11

Hi Kevin,

Welcome to Java Ranch!

I'm confused by a couple of points in your question. First, the file encoding being CP-1252. What file is encoded that way? If it's the Java source file, then I don't even think you could save the code that contains Chinese characters, but I could be wrong. If it's the encoding on the file you are creating, that would affect the contents of the file, not its name. Are you ever seeing garbled characters? If so, where? In a command line directory listing? In a graphical file explorer? In an IDE?

For what it's worth, I tried to put 恭贺新禧 into my Java source file inside the Eclipse IDE. In order to save the file, I had to change the file properties to set the encoding to UTF-16 or UTF-8 instead of Cp1252. In order to see the characters display in the Eclipse window, I had to change the font for the Java editor to Arial Unicode MS (I'm on Windows at the moment) and the Script to Chinese-GB2312.

I've never really understood that Script setting and how it relates to Unicode. It seems to me if I have a character code for 恭 (606D), and the font has a character matching that code, it should display it. Why do I have to tell it what script to use? Maybe someone can answer that for both of us!
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3646
    
  16

Greg, I think Kevin was wondering what character set the OS uses to store file names in the file system tables, and if it can be influenced in any way by the user.

I don't have an answer I'm afraid though.
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2850
    
  11

I believe all modern operating systems have the ability to store foreign characters in the file name. They might not always be able to display them in all cases though. For example, in Windows you might have to enable Asian Language Support before the system fonts used in the file explorer or the command window would be updated to show you Chinese character file names. That's why I asked Kevin where he's seeing garbled characters. Just to be sure though Kevin, what OS are you using?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18566
    
    8

It's also possible that Kevin is asking about what happens when you have Java source code containing non-ASCII characters. Presumably that source code, which is a text file, should be interpreted by the compiler as a text file in a certain encoding. If that encoding doesn't match the encoding which the editor was using when it created the file, then yes, problems are going to arise.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: file name encoding problem