QUESTION: JLS says that it converts all Unicode escapes in the source text of the program to ASCII by adding an extra �u� while simulataneously converting non-ASCII chracters in the source text to a \uXXXX escape containing a single �u�. What does it mean by non-ASCII char, and what about ASCII char present in source text. Thanks in advance ------------------ Regards Ravish
"Thanks to Indian media who has over the period of time swiped out intellectual taste from mass Indian population." - Chetan Parekh
As far as I know ASCII chars are the one between 0 and 127 since ASCII characters use 7 bits. So every character bigger than \u007F (127) are non-ASCII characters Anyone else ? ------------------ Valentin Crettaz Sun Certified Programmer for Java 2 Platform
Do we use any other character than ASCII in a JAVA source file ??
------------------ Regards Ravish
Joined: Aug 26, 2001
You can if you want to. You can write the unicode directly within your code. This way you could write some Japanese text or whatever without having them on your keyboard. For more information http://www.unicode.org HIH ------------------ Valentin Crettaz Sun Certified Programmer for Java 2 Platform
R K Singh
Joined: Oct 15, 2001
so it means that Unicode escape chars are added an extra 'u' and non-ASCII char are converted into Unicode escape chars. And ASCII char are converted in to normal int value. is my guessing is right ??? ------------------ Regards Ravish
This is interesting (from JLS 3.1) " Except for comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5), all input elements (�3.5) in a program are formed only from ASCII characters (or Unicode escapes (�3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters. " and JLS 3.2 " * A translation of Unicode escapes (�3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters. " So I guess that ASCII characters are not translated to anything because they are already Unicode characters.
SCJP2. Please Indent your code using UBB Code
R K Singh
Joined: Oct 15, 2001
Thanks Botella so should I conclude that ASCII remains as ASCII I mean "unsigned int" and any thing other than ASCII (Unicoe escape or "comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5) " are converted in to Unicode char which contains a single 'u'. does it mean ,ASCII does not contain 'u' and are simple unsigned int? I think YES CMIW ------------------ Regards Ravish
Joined: Jul 03, 2001
The compiler internally works with Unicode characters. This has nothing to do with Java types, nor "unsigned int" has either. The compiler accepts only Unicode characters (or escapes) or a Java program compound of a sequence of Unicode escapes as described in JLS 3.3 . The last is possible because Unicode escapes are ASCCII characters. The compiler accepts a source program with only ASCII characters because ASCII characters are also Unicode characters. The first lexical translation (made by the compiler I guess) is to change Unicode escapes into the Unicode characters they represent.
so should I conclude that ASCII remains as ASCII I mean "unsigned int" and any thing other than ASCII (Unicoe escape or "comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5) " are converted in to Unicode char which contains a single 'u'.
Not really. You can write an ASCII char with a Unicode escape notation: \u0000 is the ASCII null. The Unicode escapes are used to write characters not directly writable by the editor, and to translate a Unicode Java written program into an ASCII one. I guess the compiler is able to accept Unicode characters greater than 127 in the content of String and char literals, and inside identifiers and comments. For all the rest ASCII (again Unicode characteres less than 128) is expected. But this doesn't mean that u is used. The Unicode characters is a value between 0000 and FFFF and the Unicode escapes are used for the situations commented above. I hope it helps.