This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Unicode Migraine!

 
Graham Mead
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi could someone tell me at what point I'm getting confused
If I have
char c = 'g'
int i = c;
Then i is 103 which as far as I am aware is the Unicode value of 'g'
But if I use
int i = Character.getNumericValue('g');
Which in the docs says it returns the unicode numeric value of a character
I get 16.
If I do
System.out.println('\u0103') I get a '?'
and
System.out.println('\u0016');
I get a strange symbol.
My head hurts!!!
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, but you might be missing some of the basics of unicode. The "numeric value" referred to is not what you think it is. The number corresponding to the Unicode character would simply be (int)c!
The getNumericValue() method returns the "numeric value" Unicode property, i.e. the value represented by the character in a number. You probably know that in a hexadecimal (base-16) number, the letter 'f' is a digit representing the number 15. In exactly the same way, if your number is base-17 or higher, the letter 'g' is a digit representing the number 16. That is what getNumericValue() returns.
The Character class gives access to a number of Unicode properties like this; other examples would be the type (aka category) and the case.
- Peter
[ March 05, 2003: Message edited by: Peter den Haan ]
 
Greg Charles
Sheriff
Posts: 2984
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Speaking of hexidecimal, that's the system used in the Unicode escape sequences. You're right that 'g' is 103 in decimal, but in hexadecimal, it's 67. The code you need to use is \u0067.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic