• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

UTF encoding

 
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
According R&H, UTF encoding uses as many bits as needed to encode a character. (P439). However, Bill Brodgen said in his exam cram book P.287, a single character (using UTF-8 encoding scheme) may end up encoded in one, two or three bytes, but not more.
Which version should we follow in the real test? There are self-test questions based on the above point in both books.
I also checked Khalid book, which said "the UTF8 encoding has a multi-byte encoding format P570". So he can't be wrong both ways.
Please also note the three books use different format to refer to this encoding scheme, which itself might reflect my point.
R&H: UTF
Brodgen: UTF-8
Khalid: UTF8
Can someone shed more light?
[This message has been edited by Tom Tang (edited February 11, 2001).]
 
Tom Tang
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I checked www.unicode.org and find out there are UTF-8, UTF-16 and UTF-32 encoding. Maybe that answered the question. But I still anybody who have better knowledge to shed more light.
 
Tom Tang
Ranch Hand
Posts: 133
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As usual, I found the answer at Maha Anna's discussion page:
Java uses a system called UTF for I/O to support international character sets
True. Java uses a conversion method called UTF-8 which is a subset of UTF. Subset in the
sense, in true UTF a char can be encoded from 1 byte to ANY no of bytes. This means we
can cover ALL CHARS IN ALL LANGUAGES IN THE WORLD. So UTF is true transformation.
This means a small char can have lesser no of bytes , at the same time a BIG-LOOK&FEEL
( ) asian char may be encoded with many no. of bytes. Since in Java all chars can have
max 16 bits,(Unicode char) , All IO operations which need char transformation of bytes (All
readers/writers) uses a pre-defined transformation format. (i.e) a char can be encoded to
1 or 2 or 3 bytes ONLY . max 3 bytes. There are some rules which chars are encoded with
how many no of bytes. It is in Java Doc. I also found a error in Bill brogden's book recently
here which illustrates this concept.
the link www.javaranch.com/maha/Discussions/java_io_Package/true-false_-_JavaRanch_Big_Moose_Saloon.htm
[This message has been edited by Tom Tang (edited February 12, 2001).]
 
He was giving me directions and I was powerless to resist. I cannot resist this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic