• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Rob Spoor
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Junilu Lacar
  • Tim Cooke
Saloon Keepers:
  • Tim Holloway
  • Piet Souris
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Frits Walraven
  • Himai Minh

1 Character seems to be written as one byte

 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All along it has been hammered into my head that in java, characters are Unicode and they occupy 2 bytes, but it seems as if FileWriter does not fully agree.

So I tried something basic - I wrote the little program below to write a character and read it back. Based on the output of the "dir" command in Vista, it seems that it's writing one byte, not two as I expected. I even tried using the PrintWriter instead and I get the same result. Also any characters beyond \u007F seem to be written and read back as Ascii 63.

Can anyone explain whats going on here?

import java.io.*;
class Writer2 {
public static void main(String [] args) {
char[] in = new char[50]; // to store input
int size = 0;
try {
File file = new File( "fileWrite2.txt");
FileWriter fw = new FileWriter(file);
fw.write('\u0100');
fw.flush();
fw.close();
FileReader fr = new FileReader(file);
size = fr.read(in);
System.out.print(size + " "); // how many bytes read
for(char c : in) // print the array
{
System.out.println(c + "<->" + Integer.toString(c));
}
fr.close(); // again, always close
} catch(IOException e) { }
}
}
 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It seems FileWriter doesn't use Unicode encoding by default. You can check the default encoding by printing FileWriter.getEncoding(). Try using OutputStreamWriter and specifying the encoding explicitly:

OutputStreamWriter fw = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
 
Rancher
Posts: 43026
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To elaborate on what Satish said, if you don't specify the encoding during I/O, then the platform default encoding will be used. That's CP-1252 (I think) on Windows, MacRoman on OS X, and something else again on other variants of Unix/Linux. Rarely will it be some form of Unicode.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic