I have seen some technical articles say that Java is using UTF-16 as its internal encoding. And I am getting frustrated about the usage of getBytes(String charset) and the contructor of the String class, String(byte[],chartset), so I would like to make it clear.
For getBytes(String charset), the javadoc says that it will return a byte array using the specified charset to encode. Does it mean that if I have a string in big5 encoding, when I execute the statement str.getBytes("BIG5"), I am telling the jvm that the string is in big5 encoding and it will convert the string from big5 to UTF-16 and then store it in memory in UTF-16 format? Or it means that the resulting byte[] is in Big5 format?
Furthermore, if I have another string in big5, and I have a database whose encoding is UTF-8, is the following statement correct so that I can store the string in database properly in UTF-8 format?
new String(str.getBytes("Big5"), "UTF-8");
I am really feel frustrated and maybe I am not asking my question clearly, I apologise for any inconvenience caused. Hope someone can clear my frustration, thanks a lot.
[ July 04, 2006: Message edited by: Taka Chan ]