File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Weird unicode translation problem Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Weird unicode translation problem " Watch "Weird unicode translation problem " New topic
Author

Weird unicode translation problem

Tony Evans
Ranch Hand

Joined: Jun 29, 2002
Posts: 573
I have a method that translates a string into a unicode string

public String unicodeEncoderAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return (new String(sb));
}

Takes a string Hello an returns \u0048\u0065\u006c\u006c\u006f

The problem is translating it back

String str = encoder.unicodeEncoderAll("Hello");
System.out.println("Test Decoded str "+encoder.testDecoder(str));

outputs \u0048\u0065\u006c\u006c\u006f

System.out.println("Test Decoded str "+encoder.testDecoder("\u0048\u0065\u006c\u006c\u006f"));

outputs Hello

So the encoded string will not output the string Hello, but it will decode back if hardcoded in.

Cheers for any help

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41587
    
  54
The "\uxxxx" notation is only valid in some places (like javac, ascii2unicode and some others), not everywhere. The code is not actually adding Unicode characters, it adds a literal backslash, a "u" and 4 digits to a string. Think about it - how else would you construct a string that contains the literal sequence "\u0048\u0065\u006c\u006c\u006f "?

The testDecoder method needs to look for a backslash, a "u" and then 4 digits (a regexp can do that nicely), and then use the 4 digits to create a Character object which can be appended to the output string.


Ping & DNS - my free Android networking tools app
Tony Evans
Ranch Hand

Joined: Jun 29, 2002
Posts: 573
Thanks for the replu Ulf,

have to admit I am all over the place on this, I take it there is not a object that will take a string "u0079" know its unicode and return its Character.

I have looked at the object Character

I can do this

Character ch = new Character('\u00F6');

Hard code it, but I have a series of strings that are supposed to represent unicode.

Thanks for your help Tony
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41587
    
  54
You could do something like this, where you would construct the value of "i" from the 4 digits:

// "H" is hex 0048.

int i = 0 * 4096 + 0 * 256 + 4 * 16 + 8;

char ch = (char) i;

Note that this does not work with Unicode characters beyond the BMP (which have numeric values larger than 65535).
Tony Evans
Ranch Hand

Joined: Jun 29, 2002
Posts: 573
Thanks Ulf will give that a try. I

Cheers Tony
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41587
    
  54
Even easier, use Integer's built-in method:

String s = "0048";

int i = Integer.parseInt(s, 16);
Tony Evans
Ranch Hand

Joined: Jun 29, 2002
Posts: 573
Thanks ULF, now that I know unicode a bit better I realised that I did not need to encode an decode, but I did it as a simple exercise anyway.

Here is my app if anyone else wants to paly about with unicode.
public class Converter {


public String uniCodeEnCodeAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return sb.toString();
}

public String readAString(String str){
return str;
}

public String decoder(String str){
str= str+"#";
StringBuffer sb = new StringBuffer();
String [] strArray = str.split("u");
for(int index=0;index < strArray.length; index++){
System.out.println(strArray[index]);
Integer intg = getIntValue(strArray[index]);
if(intg != null){
char ch = (char) intg.intValue();
sb.append(ch);
}
}
return sb.toString();
}

private Integer getIntValue(String str){

char [] ch = str.toCharArray();
if(ch.length > 1){
StringBuffer sb = new StringBuffer();
for(int index=0; index < ch.length-1;index++){
sb.append(ch[index]);
}
str = sb.toString();
System.out.println("Process String "+str);
return Integer.parseInt(str,16);
}
return null;
}

}
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19672
    
  18

Please UseCodeTags. You can use the edit button to add them.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Weird unicode translation problem