This week's giveaway is in the Spring forum.
We're giving away four copies of Learn Spring Security (video course) and have Eugen Paraschiv on-line!
See this thread for details.
Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How many bits are there for UTF characters?

 
weiliu lili
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How many bits are there for UTF characters?
 
Thomas Kijftenbelt
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
UTF-8 uses 1-3 bytes per character (the number of bytes depends on the character).
Greetings,
TK
SCJP
 
Jamal Hasanov
Ranch Hand
Posts: 411
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
24 bits
Jamal Hasanov
www.j-think.com
 
John Dale
Ranch Hand
Posts: 399
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I presume by UTF, you are talking about the widely used UTF-8. UTF-8 uses 1 to 3 bytes, or 8 to 24 bits, per Unicode character, depending on the character.
There are other UTF formats, like UTF-16, that represent the data differently. UTF-16 uses 16 bits per character.
UTF-16 has the advantage of having all the characters the same size, while UTF-8 usually takes less space, at least if most of the characters can be encoded in 8 bits, like the displayable ASCII characters.
UTF-8 is more likely to be used when it is known that the data will be access serially, as when it is sent across the network. UTF-16 is used when the data might be access in random order, as in a file. For example, Windows NT/2000 use UTF-16 to store Unicode data on disk.
For an introduction to Unicode and encoding, you might look at The Unicode´┐Ż Standard: A Technical Introduction.
 
Don't get me started about those stupid light bulbs.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic