This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Which one of the following are not valid character contants?  Select any two. (a)char c = '\u00001' ; (b)char c = '\101'; (c)char c = 65; (d)char c = '\1001' ;
and the answers are (a) and (d) . The thing is that i am not clear with the unicode notation and this tyoe character constants.Please suggest me a link on which i can read and understand the explanations of this answer and also "unicode".
ASCII was only a seven-bit code. When computers were invented, different manufacturers standardized on the first seven bits of a byte as being ASCII. Computer companies didn't want to waste the eighth bit. 8 is a power of 2 and since digital electronics is based on base-2 math, there was a bit left over that different computer companies used in different ways.
Unicode was proposed as a new standard to replace ASCII and *all* of the many code pages that existed for all of the symbols in all of the languages of the world. Originally, Unicode was a 16-bit standard which yielded 64K characters in the character set which was thought to be big enough to encode everything.
Java supported the early Unicode Standard from the beginning so, in Java, a char is 16-bits.
In Java, a char is assignment-compatible with an byte, a short, or an int, but the difference is that a char is a 16-bit unsigned integer and an short is a 16-bit signed integer.
char c1 = 65; // This is legal. It assigns a positive integer literal to a character, 'A' char c2 = '\u0066'; // This is legal. It assigns c2 to be a capital 'B', using the Unicode escape form. char c3 = 'C'; // This is legal. It assigns c3 to be a capital 'C', using a char literal. char c4 = '\u00067'; // This is *NOT* legal because a char only has 16 bits so you have 4 hex digits to work with and this example has 5 hex digits which cannot be held in a char variable.
char c5 = '\101'; // This seems to compile, but I don't understand it. I would use c6 instead. char c6 = '\u0101'; // This seems to me what c5 should be (but I'm not sure why c5 even compiles).
char c7 = '\1001'; // This is illegal. The 'u' is missing from the Unicode escape sequence char c8 = '\u1001'; // This is legal. The missing 'u' from c7 is supplied here.
Unicode turned out not to be as simple as having a single encoding for all characters. Ror example, developers didn't want to use 16 bits to transmit characters over the Internet when 8 character encodings were twice as fast. So variations of Unicode exist, namely UTF-8 which is mostly 8 bit characters but some character encodings are longer.
In my opinion, Unicode is good because it does simplify things over the older standards where you could only use one code-page at a time and were therefore limited to 256 chars. Unicode 1.0 through 3.0 are much better, allowing up to 64K characters to be encoded in a 16-bit char.
Unicode versions 4.0, 5.0 broke the 16-bit limit, but is only an issue relatively rare characters for Asian languages that don't fit into Unicode 3.0.
You can use the char type for Unicode 1.0 through Unicode 3.0.
If you want to break the 16-bit limit for chars, you use an int which is 32 bits, 21 of which are used by Unicode nowadays.
So Unicode literally means "one-code" but it is not one code. it is better than having innumerable code-pages that you could only work with one at-a-time, but Unicode really can be encoded in a few different ways (such as UTF-8, UTF-16, and Unicode 4.0, and Unicode 5.0). [ February 28, 2008: Message edited by: Kaydell Leavitt ]
And if you're not confused enough yet, have a read of this post and this post, both of which explain what happens if one character isn't the same as one char. That's advanced stuff, though, but it's good to keep in the back of your head the fact that this can happen. [ February 28, 2008: Message edited by: Ulf Dittmer ]