• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Weird unicode translation problem

 
Tony Evans
Ranch Hand
Posts: 598
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a method that translates a string into a unicode string

public String unicodeEncoderAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return (new String(sb));
}

Takes a string Hello an returns \u0048\u0065\u006c\u006c\u006f

The problem is translating it back

String str = encoder.unicodeEncoderAll("Hello");
System.out.println("Test Decoded str "+encoder.testDecoder(str));

outputs \u0048\u0065\u006c\u006c\u006f

System.out.println("Test Decoded str "+encoder.testDecoder("\u0048\u0065\u006c\u006c\u006f"));

outputs Hello

So the encoded string will not output the string Hello, but it will decode back if hardcoded in.

Cheers for any help

 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The "\uxxxx" notation is only valid in some places (like javac, ascii2unicode and some others), not everywhere. The code is not actually adding Unicode characters, it adds a literal backslash, a "u" and 4 digits to a string. Think about it - how else would you construct a string that contains the literal sequence "\u0048\u0065\u006c\u006c\u006f "?

The testDecoder method needs to look for a backslash, a "u" and then 4 digits (a regexp can do that nicely), and then use the 4 digits to create a Character object which can be appended to the output string.
 
Tony Evans
Ranch Hand
Posts: 598
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the replu Ulf,

have to admit I am all over the place on this, I take it there is not a object that will take a string "u0079" know its unicode and return its Character.

I have looked at the object Character

I can do this

Character ch = new Character('\u00F6');

Hard code it, but I have a series of strings that are supposed to represent unicode.

Thanks for your help Tony
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could do something like this, where you would construct the value of "i" from the 4 digits:

// "H" is hex 0048.

int i = 0 * 4096 + 0 * 256 + 4 * 16 + 8;

char ch = (char) i;

Note that this does not work with Unicode characters beyond the BMP (which have numeric values larger than 65535).
 
Tony Evans
Ranch Hand
Posts: 598
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Ulf will give that a try. I

Cheers Tony
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Even easier, use Integer's built-in method:

String s = "0048";

int i = Integer.parseInt(s, 16);
 
Tony Evans
Ranch Hand
Posts: 598
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks ULF, now that I know unicode a bit better I realised that I did not need to encode an decode, but I did it as a simple exercise anyway.

Here is my app if anyone else wants to paly about with unicode.
public class Converter {


public String uniCodeEnCodeAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return sb.toString();
}

public String readAString(String str){
return str;
}

public String decoder(String str){
str= str+"#";
StringBuffer sb = new StringBuffer();
String [] strArray = str.split("u");
for(int index=0;index < strArray.length; index++){
System.out.println(strArray[index]);
Integer intg = getIntValue(strArray[index]);
if(intg != null){
char ch = (char) intg.intValue();
sb.append(ch);
}
}
return sb.toString();
}

private Integer getIntValue(String str){

char [] ch = str.toCharArray();
if(ch.length > 1){
StringBuffer sb = new StringBuffer();
for(int index=0; index < ch.length-1;index++){
sb.append(ch[index]);
}
str = sb.toString();
System.out.println("Process String "+str);
return Integer.parseInt(str,16);
}
return null;
}

}
 
Rob Spoor
Sheriff
Pie
Posts: 20545
56
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please UseCodeTags. You can use the edit button to add them.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic