I am struggling to get a String representation of a 4 byte array that represents an Integer (written from DataOutputStream). It seems that whenever one of the bytes results in a -127 value (HEX 0X81), when I convert the byte array to a String using new String(byte), a subsequent call to String.getBytes() converts the value -127 to 63.
I can share the complete code if needed, but I found I can reproduce this with a snippet as below
byte bb = new byte;
bb = -127;
The behavior of this constructor when the given bytes are not valid in the default charset is unspecified
for getBytes() they wrote:
The behavior of this method when this string cannot be encoded in the default charset is unspecified.
Then run the code below and check what is default charset on your JVM:
On my computer (Windows XP code page 1252) default charset is UTF-8.
If you look at this encoding (http://en.wikipedia.org/wiki/UTF-8), you will see that "byte" codes 128-193, 245-255 are invalid,
only codes from range 0-127 are "green" (allowed), the rest have special meaning.
Code 0x81 in this charset means "start of 2-byte sequence", so a single byte 0x81 is invalid too.
If some byte value has no valid representation in your charset, then these functions cannot map
this byte to unicode and will give strange results