The moose likes Java in General and the fly likes number of characters in UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of REST with Spring (video course) this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "number of characters in UTF-8" Watch "number of characters in UTF-8" New topic

number of characters in UTF-8

maya fur

Joined: Sep 08, 2005
Posts: 11
I'm trying to find the number of chars (letters) in a word that is in UTF-8 format - not the number of bytes.
String.length() returns the length in UTF-16 format.
For most words the simple String.length() works, but for example:
String s = "文書の場合";
The length (s.length()) is 9.

I using the String constructer String(byte[]´┐Żbytes, String´┐ŻcharsetName), but the length method is still UTF-16.

Michael Lloyd Lee

Joined: Sep 07, 2005
Posts: 22
Can you post the string as unicode escapes? (i.e. \uxxxx)

How about: toCharArray().length?

Please please please use code tags!<br /> <br /><a href="" target="_blank" rel="nofollow">Java API</a> - <a href="" target="_blank" rel="nofollow">Java Tutorials</a>
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42959
This article is highly relevant to your question.
maya fur

Joined: Sep 08, 2005
Posts: 11
Thanks you very much for the article! It was very helpful!
I agree. Here's the link:
subject: number of characters in UTF-8
It's not a secret anymore!