This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
arfeen khan wrote:I read in java blogs that String objects are represented as UATf16 format.
Internally, yes - generally (see below). However, it also includes "surrogate pairs", which I'm not sure certain are included in the UTF-16 standard. They also do not contain BOMs (Byte Order Marks) since Java internal byte order is always the same.
Can we proof it by any piece of code? Meaning any program that can show us that String is represented by UTF16.
Sure. Bang some text, especially containing some esoteric characters, into a String, and print out the value of each character.
However, my question would be: Why would you want to? It's clearly stated in the JLS that char "values are 16-bit unsigned integers representing UTF-16 code units". And since Strings are (generally) made up of chars, it stands to reason that Strings are made up of UTF-16 characters.
I say "generally", because I believe you can now specify that Strings use bytes internally to save space; although exactly how that works, I don't know.
Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
arfeen khan wrote:
Meaning any program that can show us that String is represented by UTF16.
Even if Java internally represented Strings otherwise, in, say, UTF8 or UTF32, you could not tell or prove. The API does not give access to this. You can of course check the source of String, but one could imagine a different implementation of the same API.