Meaningless Drivel is fun!
The moose likes I/O and Streams and the fly likes 1 Character seems  to be written as one byte Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "1 Character seems  to be written as one byte" Watch "1 Character seems  to be written as one byte" New topic

1 Character seems to be written as one byte

Sev Zaslavsky

Joined: Nov 19, 2008
Posts: 7
All along it has been hammered into my head that in java, characters are Unicode and they occupy 2 bytes, but it seems as if FileWriter does not fully agree.

So I tried something basic - I wrote the little program below to write a character and read it back. Based on the output of the "dir" command in Vista, it seems that it's writing one byte, not two as I expected. I even tried using the PrintWriter instead and I get the same result. Also any characters beyond \u007F seem to be written and read back as Ascii 63.

Can anyone explain whats going on here?

class Writer2 {
public static void main(String [] args) {
char[] in = new char[50]; // to store input
int size = 0;
try {
File file = new File( "fileWrite2.txt");
FileWriter fw = new FileWriter(file);
FileReader fr = new FileReader(file);
size =;
System.out.print(size + " "); // how many bytes read
for(char c : in) // print the array
System.out.println(c + "<->" + Integer.toString(c));
fr.close(); // again, always close
} catch(IOException e) { }
Satish Chilukuri
Ranch Hand

Joined: Jun 23, 2005
Posts: 266
It seems FileWriter doesn't use Unicode encoding by default. You can check the default encoding by printing FileWriter.getEncoding(). Try using OutputStreamWriter and specifying the encoding explicitly:

OutputStreamWriter fw = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
To elaborate on what Satish said, if you don't specify the encoding during I/O, then the platform default encoding will be used. That's CP-1252 (I think) on Windows, MacRoman on OS X, and something else again on other variants of Unix/Linux. Rarely will it be some form of Unicode.
I agree. Here's the link:
subject: 1 Character seems to be written as one byte
jQuery in Action, 3rd edition