Converting between bytes and characters is done via a
character encoding. A character encoding is what defines what sequence of numbers (bytes) represents what character. ASCII, for example, is a character encoding, which specifies that 65 means A, etc.
The
ASCII character encoding is very limited, because it defines only 128 characters - far too few to be able to encode all the different characters that are used around the world. So, in the course of time, people came up with a whole list of other character encodings. With some of these encodings, a character is represented by two bytes, or even by a variable number of bytes.
Unicode is a standard way of dealing with text. It defines a family of character encodings, such as UTF-8 and UTF-16, to encode characters. UTF-8 is a variable-length encoding in which characters take up between one and six bytes.
Internally, Java represents characters using UTF-16, which uses two bytes per character. If you directly cast a char to an int, you get the UTF-16 code for the character.
Java has two kinds of classes for doing I/O:
Streams (InputStream and OutputStream) are for reading and writing bytes.
Readers and
Writers (for example, FileReader, PrintWriter) are for reading and writing characters (text). They decode and encode bytes to and from characters, using a character encoding.
Some Readers and Writers allow you to specify the character encoding that should be used. For example, InputStreamReader has constructors that allow you to specify the encoding.
If you don't specify the encoding, then Java will use whatever is the default character encoding for your system.