I was looking at the methods in the class Character. I need a simple method to change a char to an int in Unicode. But there is no such method. I just found a few methods, including one called toCodePoint(char high, char low), which deal with surrogate pairs. According to the API, toCodePoint "converts the specified surrogate pair to its supplementary code point value." Why can't you just change a char to a number? And what are surrogate pairs?
Originally posted by Kevin Tysen: ... Why can't you just change a char to a number? ...
Like this...?
char c = 'x'; int i = c;
The original Unicode specification used 16-bit values, and this is what Java's char type was based on. This allows for values between 0 and 65535 (that is, 0 and 2^16 - 1), which is also written as U+0000 and U+FFFF.
However, the Unicode specification has since been expanded to allow for values up to U+10FFFF, with values above the 16-bit limit of U+FFFF called "supplementary characters." In Java, supplementary characters are represented as a pair of char values. The first of these is called the "high surrogate" and the second is the "low surrogate."
For a char value that that does not use surrogate pairs, you can simply widen to type int with an assignment conversion (as shown above). [ August 06, 2007: Message edited by: marc weber ]
"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer sscce.org
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 35253
7
posted
0
There are a couple of blog posts that give a brief introduction into surrogate pairs and how to handle them: Tom White and John O'Conner