wood burning stoves 2.0
The moose likes Java in General and the fly likes number of characters in UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of OCA Java SE 8 Programmer I Study Guide 1Z0-808 this week in the OCAJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "number of characters in UTF-8" Watch "number of characters in UTF-8" New topic

number of characters in UTF-8

maya fur

Joined: Sep 08, 2005
Posts: 11
I'm trying to find the number of chars (letters) in a word that is in UTF-8 format - not the number of bytes.
String.length() returns the length in UTF-16 format.
For most words the simple String.length() works, but for example:
String s = "文書の場合";
The length (s.length()) is 9.

I using the String constructer String(byte[]´┐Żbytes, String´┐ŻcharsetName), but the length method is still UTF-16.

Michael Lloyd Lee

Joined: Sep 07, 2005
Posts: 22
Can you post the string as unicode escapes? (i.e. \uxxxx)

How about: toCharArray().length?

Please please please use code tags!<br /> <br /><a href="http://java.sun.com/j2se/1.5.0/docs/api/" target="_blank" rel="nofollow">Java API</a> - <a href="http://java.sun.com/docs/books/tutorial/index.html" target="_blank" rel="nofollow">Java Tutorials</a>
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42952
This article is highly relevant to your question.
maya fur

Joined: Sep 08, 2005
Posts: 11
Thanks you very much for the article! It was very helpful!
I agree. Here's the link: http://aspose.com/file-tools
subject: number of characters in UTF-8