File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes Is String represented using UTF16? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Is String represented using UTF16?" Watch "Is String represented using UTF16?" New topic
Author

Is String represented using UTF16?

arfeen khan
Greenhorn

Joined: May 13, 2011
Posts: 25
Hello there,

I read in java blogs that String objects are represented as UATf16 format.
Can we proof it by any piece of code?
Meaning any program that can show us that String is represented by UTF16.

Thanks in advance,
Arfeen.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7501
    
  18

arfeen khan wrote:I read in java blogs that String objects are represented as UATf16 format.

Internally, yes - generally (see below). However, it also includes "surrogate pairs", which I'm not sure certain are included in the UTF-16 standard. They also do not contain BOMs (Byte Order Marks) since Java internal byte order is always the same.

Can we proof it by any piece of code? Meaning any program that can show us that String is represented by UTF16.

Sure. Bang some text, especially containing some esoteric characters, into a String, and print out the value of each character.

However, my question would be: Why would you want to? It's clearly stated in the JLS that char "values are 16-bit unsigned integers representing UTF-16 code units". And since Strings are (generally) made up of chars, it stands to reason that Strings are made up of UTF-16 characters.

I say "generally", because I believe you can now specify that Strings use bytes internally to save space; although exactly how that works, I don't know.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 866
    
    5
arfeen khan wrote:
Meaning any program that can show us that String is represented by UTF16.


Even if Java internally represented Strings otherwise, in, say, UTF8 or UTF32, you could not tell or prove. The API does not give access to this. You can of course check the source of String, but one could imagine a different implementation of the same API.
arfeen khan
Greenhorn

Joined: May 13, 2011
Posts: 25
Thank You Winston Gutkowski for your reply.

Thank you Ivan Jozsef Balazs for suggestion.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Is String represented using UTF16?
 
Similar Threads
UTF16 characters not printed properly on console
Conversion from UTF16 to UTF8
primitive conversion
How can i determine what is the char code set ?
Unicode question