wood burning stoves 2.0*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes What is the range for unicode values in char data type? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "What is the range for unicode values in char data type?" Watch "What is the range for unicode values in char data type?" New topic
Author

What is the range for unicode values in char data type?

Leandro Melo
Ranch Hand

Joined: Mar 27, 2004
Posts: 401
What is the range for all unicode values that i can use to initialize a char data type?


Leandro Melo
SCJP 1.4, SCWCD 1.4
Mike Gershman
Ranch Hand

Joined: Mar 13, 2004
Posts: 1272
A char can range from 0 to 65535. I don't know how many of these values have been assigned a Unicode graphic.


Mike Gershman
SCJP 1.4, SCWCD in process
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

You might be interested in the method, Character.isDefined(char ch), which returns a boolean depending on whether the argument char is defined in Unicode.

In general, you'll find that within the range of possible char values, there are numerous Unicode gaps. For example, \u0237 through \u0249 are not defined. You can assign these values to a char, but they won't translate to Unicode characters.

Also note that in Java 1.5, some of the values within the char range are used for "surrogate pairs," which allows representation of supplementary characters -- that is, Unicode characters with code points greater than \uFFFF. In the context of a 16-bit char, these surrogate values (\uD800 - \uDFFF) are considered undefined.

"...supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF). A char value, therefore, represents Basic Multilingual Plane (BMP) code points [\u0000 to \uFFFF], including the surrogate code points..."

Ref: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html
[ February 24, 2005: Message edited by: marc weber ]

"We're kind of on the level of crossword puzzle writers... And no one ever goes to them and gives them an award." ~Joe Strummer
sscce.org
marc weber
Sheriff

Joined: Aug 31, 2004
Posts: 11343

Originally posted by Mike Gershman:
A char can range from 0 to 65535. I don't know how many of these values have been assigned a Unicode graphic.

I count 59177.
Leandro Melo
Ranch Hand

Joined: Mar 27, 2004
Posts: 401
I know they range from 0 to 65535 in terms of integers, but i really would like to know in terms of unicode characters.
The point is that i`ve seen questions on mock exams that asks me for example if

char a = '\u000d'

is valid. In this case, it`s not, but it really looks it would be allright. I also checked that

char b = '\u101'

is also valid. This is weird or no? So, am i supposed to memorize all valid unicode initializations for the exam?
Mike Gershman
Ranch Hand

Joined: Mar 13, 2004
Posts: 1272
The point is that i`ve seen questions on mock exams that asks me for example if

char a = '\u000d'

is valid. In this case, it`s not, but it really looks it would be allright.

'\u000d' is the carriage return character (not 'a') and is a legal unicode character. However, '\u000d' and '\u000a' (new line) should not appear anywhere in a Java source program because the Java compiler will treat them as actual line breaks in your program text and break your statement into two lines. Use '\r' and '\n' instead.

If you really want to learn some Unicode, just remember those two, 'u0020' is blank, numbers start with '\u0030' is 0 and 'u0031' is 1, etc., capital letters start with '\u0041' is A, and lower case letters start with '\u0061' is a. That is more than enough for the SCJP exam and for ordinary programming in the English language.
Leandro Melo
Ranch Hand

Joined: Mar 27, 2004
Posts: 401
Thanks for the explanation Mike, you got to the point I need.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: What is the range for unicode values in char data type?
 
Similar Threads
Character representation in Octal format..
Increment a character...
char primitive initialization values
Range of char type 0 -216 ???
about char declaration