File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Unicode Migraine! Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Unicode Migraine!" Watch "Unicode Migraine!" New topic

Unicode Migraine!

Graham Mead
Ranch Hand

Joined: Sep 28, 2001
Posts: 57
Hi could someone tell me at what point I'm getting confused
If I have
char c = 'g'
int i = c;
Then i is 103 which as far as I am aware is the Unicode value of 'g'
But if I use
int i = Character.getNumericValue('g');
Which in the docs says it returns the unicode numeric value of a character
I get 16.
If I do
System.out.println('\u0103') I get a '?'
I get a strange symbol.
My head hurts!!!
Peter den Haan
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Ah, but you might be missing some of the basics of unicode. The "numeric value" referred to is not what you think it is. The number corresponding to the Unicode character would simply be (int)c!
The getNumericValue() method returns the "numeric value" Unicode property, i.e. the value represented by the character in a number. You probably know that in a hexadecimal (base-16) number, the letter 'f' is a digit representing the number 15. In exactly the same way, if your number is base-17 or higher, the letter 'g' is a digit representing the number 16. That is what getNumericValue() returns.
The Character class gives access to a number of Unicode properties like this; other examples would be the type (aka category) and the case.
- Peter
[ March 05, 2003: Message edited by: Peter den Haan ]
Greg Charles

Joined: Oct 01, 2001
Posts: 2968

Speaking of hexidecimal, that's the system used in the Unicode escape sequences. You're right that 'g' is 103 in decimal, but in hexadecimal, it's 67. The code you need to use is \u0067.
I agree. Here's the link:
subject: Unicode Migraine!
It's not a secret anymore!