This week's book giveaway is in the General Computing forum.
We're giving away four copies of Arduino in Action and have Martin Evans, Joshua Noble, and Jordan Hochenbaum on-line!
See this thread for details.
The moose likes Java in General and the fly likes Converting special characters in a String to equivalent Unicode Escape Code Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Converting special characters in a String to equivalent Unicode Escape Code" Watch "Converting special characters in a String to equivalent Unicode Escape Code" New topic
Author

Converting special characters in a String to equivalent Unicode Escape Code

Ripan Singh
Greenhorn

Joined: Aug 11, 2011
Posts: 5
Hi

I want to covert special characters (e.g. a Latin alphabet) in a string to equivalent Unicode Escape Code.
Example: String "JØran" should be converted to "J\U00D8ran".

Do java provide any function/utility for this or there exist any third party library?
Thanks in Advance
Vijay Tidake
Ranch Hand

Joined: Nov 04, 2008
Posts: 146

Hi,

you can use java native2ascii.exe in the <JAVA_HOME>/bin directory.

the commad used is native2ascii -encoding UTF-8 <text file have JØran> <text file going to have unicode>

Thanks


The important thing is not to stop questioning.Curiosity has its own reason for existing.
Ripan Singh
Greenhorn

Joined: Aug 11, 2011
Posts: 5
I want to convert the Special Character in a java program where I have different indepedent string values.
I dont have a text file for conversion.
Don't java provides a class or method which can directly operate on a String?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
You don't need anything special. You think that a char is a character, but it isn't. It is an unsigned integer. There are all sorts of methods in the Character class which allow you to see which ranges a char is in. If it is in a particular range, you can convert it to hex and add a \u tag. Then use a StringBuilder to put everything back together.
Ripan Singh
Greenhorn

Joined: Aug 11, 2011
Posts: 5
Hi Campbell

It will be highly appreciated if you please write a sample code for the same.
Example: Converting string "JØran" to "J\U00D8ran".
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
I did. It took me about ten minutes. 31 lines or 28 if you miss out the blanks.
campbell@computer_name:~/java> java UnicodeCreator "Campbell Ritchie ßüÜöÖäļ JØran"
Campbell Ritchie \u00df\u00fc\u00dc\u00f6\u00d6\u00e4\u00c4\u00bc J\u00d8ran
We don't provide ready-made code.
Ripan Singh
Greenhorn

Joined: Aug 11, 2011
Posts: 5
This is really super cool

I know whole code can't be shared as per policy, kindly provide me few core code lines to have a idea
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
Vijay Tidake wrote: . . . native2ascii.exe in the <JAVA_HOME>/bin directory. . . .
I never knew about that. Thank you. It works nicely; all non-ASCII characters are changed to their Unicode® escapes.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
You get the chars from the String as a char[].
You iterate that array; if a char is in your "normal" range, you append it to a StringBuilder.
If it is outwith your "normal" range, you append \u and its 4-digit hex representation to the StringBuilder.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
I don't know how that will work for characters and glyphs whose Unicode® value is > 0xffff (65535).
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19232

Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.


SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
Ripan Singh
Greenhorn

Joined: Aug 11, 2011
Posts: 5
Thanks a lot Campbell . I got it
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
Well done
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 32831
    
    4
Rob Spoor wrote:Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.
They are formed from two chars put together to form a code point, which is of type int. You can probably iterate the String getting code points, some of which would be > 0xffff.
As I said, I don't know how my technique ought to handle them. You could split them with i & 0xffff or i >> 0x10 & 0xffff. Remember >> has a higher precedence than &.
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: Converting special characters in a String to equivalent Unicode Escape Code
 
Similar Threads
char literals
Unicode conversion
why char c='\u000a'; is not compiling?
unclosed literal
escaping quotes, single quotes in a string