| Author |
Converting special characters in a String to equivalent Unicode Escape Code
|
Ripan Singh
Greenhorn
Joined: Aug 11, 2011
Posts: 5
|
|
Hi
I want to covert special characters (e.g. a Latin alphabet) in a string to equivalent Unicode Escape Code.
Example: String "JØran" should be converted to "J\U00D8ran".
Do java provide any function/utility for this or there exist any third party library?
Thanks in Advance
|
 |
Vijay Tidake
Ranch Hand
Joined: Nov 04, 2008
Posts: 146
|
|
Hi,
you can use java native2ascii.exe in the <JAVA_HOME>/bin directory.
the commad used is native2ascii -encoding UTF-8 <text file have JØran> <text file going to have unicode>
Thanks
|
The important thing is not to stop questioning.Curiosity has its own reason for existing.
|
 |
Ripan Singh
Greenhorn
Joined: Aug 11, 2011
Posts: 5
|
|
I want to convert the Special Character in a java program where I have different indepedent string values.
I dont have a text file for conversion.
Don't java provides a class or method which can directly operate on a String?
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
|
You don't need anything special. You think that a char is a character, but it isn't. It is an unsigned integer. There are all sorts of methods in the Character class which allow you to see which ranges a char is in. If it is in a particular range, you can convert it to hex and add a \u tag. Then use a StringBuilder to put everything back together.
|
 |
Ripan Singh
Greenhorn
Joined: Aug 11, 2011
Posts: 5
|
|
Hi Campbell
It will be highly appreciated if you please write a sample code for the same.
Example: Converting string "JØran" to "J\U00D8ran".
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
I did. It took me about ten minutes. 31 lines or 28 if you miss out the blanks.
campbell@computer_name:~/java> java UnicodeCreator "Campbell Ritchie ßüÜöÖäļ JØran"
Campbell Ritchie \u00df\u00fc\u00dc\u00f6\u00d6\u00e4\u00c4\u00bc J\u00d8ran
We don't provide ready-made code.
|
 |
Ripan Singh
Greenhorn
Joined: Aug 11, 2011
Posts: 5
|
|
This is really super cool
I know whole code can't be shared as per policy, kindly provide me few core code lines to have a idea
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
Vijay Tidake wrote: . . . native2ascii.exe in the <JAVA_HOME>/bin directory. . . .
I never knew about that. Thank you. It works nicely; all non-ASCII characters are changed to their Unicode® escapes.
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
You get the chars from the String as a char[].
You iterate that array; if a char is in your "normal" range, you append it to a StringBuilder.
If it is outwith your "normal" range, you append \u and its 4-digit hex representation to the StringBuilder.
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
|
I don't know how that will work for characters and glyphs whose Unicode® value is > 0xffff (65535).
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19232
|
|
|
Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
Ripan Singh
Greenhorn
Joined: Aug 11, 2011
Posts: 5
|
|
Thanks a lot Campbell . I got it
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
Well done
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32831
|
|
Rob Spoor wrote:Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.
They are formed from two chars put together to form a code point, which is of type int. You can probably iterate the String getting code points, some of which would be > 0xffff.
As I said, I don't know how my technique ought to handle them. You could split them with i & 0xffff or i >> 0x10 & 0xffff. Remember >> has a higher precedence than &.
|
 |
 |
|
|
subject: Converting special characters in a String to equivalent Unicode Escape Code
|
|
|