File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Unicode conversion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Unicode conversion " Watch "Unicode conversion " New topic
Author

Unicode conversion

R K Singh
Ranch Hand

Joined: Oct 15, 2001
Posts: 5371
QUESTION: JLS says that it converts all Unicode escapes in the source text of the program to ASCII by adding an extra �u� while simulataneously converting non-ASCII chracters in the source text to a \uXXXX escape containing a single �u�.
What does it mean by non-ASCII char, and what about ASCII char present in source text.
Thanks in advance
------------------
Regards
Ravish


"Thanks to Indian media who has over the period of time swiped out intellectual taste from mass Indian population." - Chetan Parekh
Valentin Crettaz
Gold Digger
Sheriff

Joined: Aug 26, 2001
Posts: 7610
As far as I know ASCII chars are the one between 0 and 127 since ASCII characters use 7 bits.
So every character bigger than \u007F (127) are non-ASCII characters
Anyone else ?
------------------
Valentin Crettaz
Sun Certified Programmer for Java 2 Platform


SCJP 5, SCJD, SCBCD, SCWCD, SCDJWS, IBM XML
[Blog] [Blogroll] [My Reviews] My Linked In
R K Singh
Ranch Hand

Joined: Oct 15, 2001
Posts: 5371
Do we use any other character than ASCII in a JAVA source file ??

------------------
Regards
Ravish
Valentin Crettaz
Gold Digger
Sheriff

Joined: Aug 26, 2001
Posts: 7610
You can if you want to. You can write the unicode directly within your code. This way you could write some Japanese text or whatever without having them on your keyboard. For more information http://www.unicode.org
HIH
------------------
Valentin Crettaz
Sun Certified Programmer for Java 2 Platform
R K Singh
Ranch Hand

Joined: Oct 15, 2001
Posts: 5371
so it means that Unicode escape chars are added an extra 'u' and non-ASCII char are converted into Unicode escape chars.
And ASCII char are converted in to normal int value.
is my guessing is right ???
------------------
Regards
Ravish
Jose Botella
Ranch Hand

Joined: Jul 03, 2001
Posts: 2120
This is interesting (from JLS 3.1)
"
Except for comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5), all input elements (�3.5) in a program are formed only from ASCII characters (or Unicode escapes (�3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.
"
and JLS 3.2
"
* A translation of Unicode escapes (�3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
"
So I guess that ASCII characters are not translated to anything because they are already Unicode characters.

SCJP2. Please Indent your code using UBB Code
R K Singh
Ranch Hand

Joined: Oct 15, 2001
Posts: 5371
Thanks Botella
so should I conclude that ASCII remains as ASCII I mean "unsigned int" and any thing other than ASCII (Unicoe escape or "comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5) " are converted in to Unicode char which contains a single 'u'.
does it mean ,ASCII does not contain 'u' and are simple unsigned int?
I think YES
CMIW
------------------
Regards
Ravish
Jose Botella
Ranch Hand

Joined: Jul 03, 2001
Posts: 2120

The compiler internally works with Unicode characters. This has nothing to do with Java types, nor "unsigned int" has either.
The compiler accepts only Unicode characters (or escapes) or a Java program compound of a sequence of Unicode escapes as described in JLS 3.3 . The last is possible because Unicode escapes are ASCCII characters. The compiler accepts a source program with only ASCII characters because ASCII characters are also Unicode characters.
The first lexical translation (made by the compiler I guess) is to change Unicode escapes into the Unicode characters they represent.

so should I conclude that ASCII remains as ASCII I mean "unsigned int" and any thing other than ASCII (Unicoe escape or "comments (�3.7), identifiers, and the contents of character and string literals (�3.10.4, �3.10.5) " are converted in to Unicode char which contains a single 'u'.

Not really. You can write an ASCII char with a Unicode escape notation: \u0000 is the ASCII null. The Unicode escapes are used to write characters not directly writable by the editor, and to translate a Unicode Java written program into an ASCII one.
I guess the compiler is able to accept Unicode characters greater than 127 in the content of String and char literals, and inside identifiers and comments. For all the rest ASCII (again Unicode characteres less than 128) is expected. But this doesn't mean that u is used. The Unicode characters is a value between 0000 and FFFF and the Unicode escapes are used for the situations commented above.
I hope it helps.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Unicode conversion