hex | codepoint | character |
---|---|---|
41 | U+0041 | LATIN CAPITAL LETTER A |
C2 A3 | U+00A3 | POUND SIGN |
E0 A4 85 | U+0905 | DEVANAGARI LETTER A |
F0 90 84 B7 | U+10137 | AEGEAN WEIGHT BASE UNIT |
Carey Brown wrote:Your hex print could have been simplified to
Carey Brown wrote:
Have you tried to read the 10 bytes of your file as bytes and printing them out to see if they are what you think? Or use a hexdump utility?
Norm Radder wrote:How would the last char held in 4 bytes that maps to 21 bits fit in a single unicode character?
F0 | 90 | 84 | B7 |
1111 0000 | 1001 0000 | 1000 0100 | 1011 0111 |
1111 0xxx | 10xx xxxx | 10xx xxxx | 10xx xxxx |
Norm Radder wrote:A unicode character holds 16 bits.
Norm Radder wrote:What happens when the char requires more bits like the 4 byte UTF8 char?
I believe that Java® Strings default to an encoding called UTF-16. Not certain however.Richard Hayward wrote:. . . the java char datatype is 16 bit! . . .