GeeCON Prague 2014*
The moose likes Java in General and the fly likes number of characters in UTF-8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "number of characters in UTF-8" Watch "number of characters in UTF-8" New topic
Author

number of characters in UTF-8

maya fur
Greenhorn

Joined: Sep 08, 2005
Posts: 11
Hi,
I'm trying to find the number of chars (letters) in a word that is in UTF-8 format - not the number of bytes.
String.length() returns the length in UTF-16 format.
For most words the simple String.length() works, but for example:
String s = "文書の場合";
The length (s.length()) is 9.

I using the String constructer String(byte[]´┐Żbytes, String´┐ŻcharsetName), but the length method is still UTF-16.


Thanks
Michael Lloyd Lee
Greenhorn

Joined: Sep 07, 2005
Posts: 22
Can you post the string as unicode escapes? (i.e. \uxxxx)

How about: toCharArray().length?


Please please please use code tags!<br /> <br /><a href="http://java.sun.com/j2se/1.5.0/docs/api/" target="_blank" rel="nofollow">Java API</a> - <a href="http://java.sun.com/docs/books/tutorial/index.html" target="_blank" rel="nofollow">Java Tutorials</a>
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42038
    
  64
This article is highly relevant to your question.


Ping & DNS - my free Android networking tools app
maya fur
Greenhorn

Joined: Sep 08, 2005
Posts: 11
Thanks you very much for the article! It was very helpful!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: number of characters in UTF-8