This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

number of characters in UTF-8

 
maya fur
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I'm trying to find the number of chars (letters) in a word that is in UTF-8 format - not the number of bytes.
String.length() returns the length in UTF-16 format.
For most words the simple String.length() works, but for example:
String s = "文書の場合";
The length (s.length()) is 9.

I using the String constructer String(byte[]´┐Żbytes, String´┐ŻcharsetName), but the length method is still UTF-16.


Thanks
 
Michael Lloyd Lee
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you post the string as unicode escapes? (i.e. \uxxxx)

How about: toCharArray().length?
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This article is highly relevant to your question.
 
maya fur
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks you very much for the article! It was very helpful!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic