File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes number of characters in UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "number of characters in UTF-8" Watch "number of characters in UTF-8" New topic

number of characters in UTF-8

maya fur

Joined: Sep 08, 2005
Posts: 11
I'm trying to find the number of chars (letters) in a word that is in UTF-8 format - not the number of bytes.
String.length() returns the length in UTF-16 format.
For most words the simple String.length() works, but for example:
String s = "文書の場合";
The length (s.length()) is 9.

I using the String constructer String(byte[]´┐Żbytes, String´┐ŻcharsetName), but the length method is still UTF-16.

Michael Lloyd Lee

Joined: Sep 07, 2005
Posts: 22
Can you post the string as unicode escapes? (i.e. \uxxxx)

How about: toCharArray().length?

Please please please use code tags!<br /> <br /><a href="" target="_blank" rel="nofollow">Java API</a> - <a href="" target="_blank" rel="nofollow">Java Tutorials</a>
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
This article is highly relevant to your question.
maya fur

Joined: Sep 08, 2005
Posts: 11
Thanks you very much for the article! It was very helpful!
I agree. Here's the link:
subject: number of characters in UTF-8
It's not a secret anymore!