Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Unicode Value????

 
rajashree ghatak
Ranch Hand
Posts: 151
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all,
The following code is giving compile time error:
char var='\u000a';
System.out.print("Hi");
System.out.print(var);
System.out.println("All");
'\u000a' is the unicode value for new line according to Pg 26 of Kalid Mughal.The compile error says:"Invalid Character Constant"
but if we assign '\n' to var variable, the program compiles fine and excutes fine.
Similar error occurs when we use '\u005c' in place of '\\' for backslash escape sequence.
Can some1 explain why this error?
thanx in advance,
rajashree.
 
V Srinivasan
Ranch Hand
Posts: 99
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
we cann't print white space or escape sequence charectors.
 
rajashree ghatak
Ranch Hand
Posts: 151
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Srinivasan,
why can't we print escape sequence characters?
aren't '\n','\t','\b','\"','\f' some examples of escape sequence characters?
'\u0020' is unicode value for printing white space.
could some1 throw some light on my previously posted query?
rajashree.

 
V Srinivasan
Ranch Hand
Posts: 99
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
Unicode is for charectors not for system. As you said lets hope somebody lights up.
Thanks & regards,
V.Srinivasan
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The compiler parses your code before attempting to compile it. During that effort all of the unicode stuff is translated into it's corresponding value. The new line got translated and USED to create a new line in your code, all before compiling even started. So what the compiler saw was:
Input-
char var='\u000a';
After parsing-
char var=' //new line used up here and now GONE
';
 
rajashree ghatak
Ranch Hand
Posts: 151
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Cindy,
Thanx for ur response.But what i fail to understand is why compile time error when we use the unicode values for escape sequence characters with hex digits a,b,c,d,e,f like '\u000a'(new line),'u000c'(form feed),'\u000d'(carriage return) or '\u005c'(backslash)
All of the above give compile error:Invalid character constant.
But '\u0022'(Double Quotes),'\u0008'(Backspace),'\u0009'(Horizontal Tab)
compiles fine and executes.However, with one expection of '\u0027'which is unicode value for Single Quote which again give compile error of Invalid Character Constant.
Kindly comment on this.
rajashree.
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The characters that are part of the language itself are the problem. Backspace and Horizontal Tab are not involved in Java so they are understood to be what they claim that they are. Double Quotes embedded in single quotes becomes '"' which is clear.
A single quote embedded in single quotes ( ''') causes a problem because the character value that you are defining is complete after the second single quote according to the rules of the syntax, but then you have an additional single quote hanging there which does not fit any correct java syntax. The single quotes are PART OF the lexical structure of the definition of a character field.
As a matter of fact ANY character that Java is trying to use to understand your syntax becomes a problem if you are trying to use it as a literal instead of part of the syntax. Somehow you have to tell the compiler which way you intend that character to be used. So the rule is: if you want one of those syntax involved characters to be treated as a literal instead of part of your code, use the provided substitute instead.

From the JLS on the Lexical Structure of the language:
3.10.4 Character Literals

Because Unicode escapes are processed very early, it is not correct to write '\u000a' for a character literal whose value is linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (�3.3) and the linefeed becomes a LineTerminator in step 2 (�3.4), and so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n' (�3.10.6). Similarly, it is not correct to write '\u000d' for a character literal whose value is carriage return (CR). Instead, use '\r'.
 
V Srinivasan
Ranch Hand
Posts: 99
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Cindy,
You have cleared my doubt too. Somewhere I read print() method does't understand unicode charector, is that so, and write() method understands unicode charectors. Could you please give few line on that or where can I get writeup on these issues.
Thanks in advance.
Regards,
V. Srinivasn
 
rajashree ghatak
Ranch Hand
Posts: 151
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanx Cindy.
u have explained very well and also cleared my query.
rajashree.
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
From the API for PrintStream:

All characters printed by a PrintStream are converted into bytes using the platform's default character encoding. The PrintWriter class should be used in situations that require writing characters rather than bytes.

Unicode is 16 bit, bytes are 8 bit.
 
V Srinivasan
Ranch Hand
Posts: 99
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Cindy.
 
Prosenjit Banerjee
Greenhorn
Posts: 20
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Cindy Glass and the thread starter rajashree ghatak. And, here you look what I did. It's a new experience.
Following is a java program that consists only some unicode characters (although there are some new lines for the sake of clarity only).
This code compiles and runs without any error.


This actually looks as the following :

I LOVE JAVARANCH.COM
 
Prosenjit Banerjee
Greenhorn
Posts: 20
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everybody,
Comments are expected about my previous post. Please, because I want to match my thinkings with others'.
I LOVE JAVARANCH.COM
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Very cute .
In Just Java 2 Peter va der Linden has some clever code tricks along such lines that you can use to bewilder your co-workers. It's really quite amuzing.
 
Prosenjit Banerjee
Greenhorn
Posts: 20
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks very much Cindy. Thanks again for the reference.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic