File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes UTF encoding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "UTF encoding" Watch "UTF encoding" New topic

UTF encoding

Tom Tang
Ranch Hand

Joined: Dec 24, 2000
Posts: 133
According R&H, UTF encoding uses as many bits as needed to encode a character. (P439). However, Bill Brodgen said in his exam cram book P.287, a single character (using UTF-8 encoding scheme) may end up encoded in one, two or three bytes, but not more.
Which version should we follow in the real test? There are self-test questions based on the above point in both books.
I also checked Khalid book, which said "the UTF8 encoding has a multi-byte encoding format P570". So he can't be wrong both ways.
Please also note the three books use different format to refer to this encoding scheme, which itself might reflect my point.
Brodgen: UTF-8
Khalid: UTF8
Can someone shed more light?
[This message has been edited by Tom Tang (edited February 11, 2001).]

Sun Certified Java Programmer
Tom Tang
Ranch Hand

Joined: Dec 24, 2000
Posts: 133
I checked and find out there are UTF-8, UTF-16 and UTF-32 encoding. Maybe that answered the question. But I still anybody who have better knowledge to shed more light.
Tom Tang
Ranch Hand

Joined: Dec 24, 2000
Posts: 133
As usual, I found the answer at Maha Anna's discussion page:
Java uses a system called UTF for I/O to support international character sets
True. Java uses a conversion method called UTF-8 which is a subset of UTF. Subset in the
sense, in true UTF a char can be encoded from 1 byte to ANY no of bytes. This means we
can cover ALL CHARS IN ALL LANGUAGES IN THE WORLD. So UTF is true transformation.
This means a small char can have lesser no of bytes , at the same time a BIG-LOOK&FEEL
( ) asian char may be encoded with many no. of bytes. Since in Java all chars can have
max 16 bits,(Unicode char) , All IO operations which need char transformation of bytes (All
readers/writers) uses a pre-defined transformation format. (i.e) a char can be encoded to
1 or 2 or 3 bytes ONLY . max 3 bytes. There are some rules which chars are encoded with
how many no of bytes. It is in Java Doc. I also found a error in Bill brogden's book recently
here which illustrates this concept.
the link
[This message has been edited by Tom Tang (edited February 12, 2001).]
I agree. Here's the link:
subject: UTF encoding
It's not a secret anymore!