Win a copy of Escape Velocity: Better Metrics for Agile Teams this week in the Agile and Other Processes forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Paul Clapham
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Frank Carver
  • Junilu Lacar
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • fred rosenberger

Weird unicode translation problem

 
Ranch Hand
Posts: 681
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a method that translates a string into a unicode string

public String unicodeEncoderAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return (new String(sb));
}

Takes a string Hello an returns \u0048\u0065\u006c\u006c\u006f

The problem is translating it back

String str = encoder.unicodeEncoderAll("Hello");
System.out.println("Test Decoded str "+encoder.testDecoder(str));

outputs \u0048\u0065\u006c\u006c\u006f

System.out.println("Test Decoded str "+encoder.testDecoder("\u0048\u0065\u006c\u006c\u006f"));

outputs Hello

So the encoded string will not output the string Hello, but it will decode back if hardcoded in.

Cheers for any help

 
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The "\uxxxx" notation is only valid in some places (like javac, ascii2unicode and some others), not everywhere. The code is not actually adding Unicode characters, it adds a literal backslash, a "u" and 4 digits to a string. Think about it - how else would you construct a string that contains the literal sequence "\u0048\u0065\u006c\u006c\u006f "?

The testDecoder method needs to look for a backslash, a "u" and then 4 digits (a regexp can do that nicely), and then use the 4 digits to create a Character object which can be appended to the output string.
 
Tony Evans
Ranch Hand
Posts: 681
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the replu Ulf,

have to admit I am all over the place on this, I take it there is not a object that will take a string "u0079" know its unicode and return its Character.

I have looked at the object Character

I can do this

Character ch = new Character('\u00F6');

Hard code it, but I have a series of strings that are supposed to represent unicode.

Thanks for your help Tony
 
Ulf Dittmer
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could do something like this, where you would construct the value of "i" from the 4 digits:

// "H" is hex 0048.

int i = 0 * 4096 + 0 * 256 + 4 * 16 + 8;

char ch = (char) i;

Note that this does not work with Unicode characters beyond the BMP (which have numeric values larger than 65535).
 
Tony Evans
Ranch Hand
Posts: 681
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Ulf will give that a try. I

Cheers Tony
 
Ulf Dittmer
Rancher
Posts: 43028
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Even easier, use Integer's built-in method:

String s = "0048";

int i = Integer.parseInt(s, 16);
 
Tony Evans
Ranch Hand
Posts: 681
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks ULF, now that I know unicode a bit better I realised that I did not need to encode an decode, but I did it as a simple exercise anyway.

Here is my app if anyone else wants to paly about with unicode.
public class Converter {


public String uniCodeEnCodeAll(String str){
StringBuffer sb = new StringBuffer();
for(int i=0; i<str.length(); i++) {
char ch = str.charAt(i);
sb.append("\\u") ;
String hex = Integer.toHexString(str.charAt(i) & 0xFFFF);
for(int j=0; j><4-hex.length(); j++) {
sb.append("0");
}
sb.append(hex.toLowerCase());
}
return sb.toString();
}

public String readAString(String str){
return str;
}

public String decoder(String str){
str= str+"#";
StringBuffer sb = new StringBuffer();
String [] strArray = str.split("u");
for(int index=0;index < strArray.length; index++){
System.out.println(strArray[index]);
Integer intg = getIntValue(strArray[index]);
if(intg != null){
char ch = (char) intg.intValue();
sb.append(ch);
}
}
return sb.toString();
}

private Integer getIntValue(String str){

char [] ch = str.toCharArray();
if(ch.length > 1){
StringBuffer sb = new StringBuffer();
for(int index=0; index < ch.length-1;index++){
sb.append(ch[index]);
}
str = sb.toString();
System.out.println("Process String "+str);
return Integer.parseInt(str,16);
}
return null;
}

}
 
Sheriff
Posts: 22684
128
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please UseCodeTags. You can use the edit button to add them.
 
No one can make you feel inferior without your consent - Eleanor Roosevelt. tiny ad:
Garden Master Course kickstarter
https://coderanch.com/t/754577/Garden-Master-kickstarter
reply
    Bookmark Topic Watch Topic
  • New Topic