Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Replacing character encodings with the proper character

 
Nick Foster
Greenhorn
Posts: 6
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I was reading in bytes from an image based on an OpenLayers canvas.

After I read it in and got it in a String form, there would be a lot of encodings like %20, %7E, etc

So I created a method that very tediously checked for each encoding and if there replaces it with the proper character: data = data.replace("%20", " ");

This process is 95 lines long and I was wondering if there is a more elegant solution. I don't want to remove these I need to replace them so the image object will be read and properly created on the server side for further processing.

The process I have works, I would just like to know what other approaches there are.

This is the code as it currently exists:

private String replaceEncodings(String data) {

data = data.replace("%20", " ");
data = data.replace("%21", "!");
data = data.replace("%22", "\"");
data = data.replace("%23", "#");
data = data.replace("%24", "$");
data = data.replace("%25", "%");
data = data.replace("%26", "&");
data = data.replace("%27", "\'");
data = data.replace("%28", "(");
data = data.replace("%29", ")");
data = data.replace("%2A", "*");
data = data.replace("%2B", "+");
data = data.replace("%2C", ",");
data = data.replace("%2D", "-");
data = data.replace("%2E", ".");
data = data.replace("%2F", "/");
data = data.replace("%30", "0");
data = data.replace("%31", "1");
data = data.replace("%32", "2");
data = data.replace("%33", "3");
data = data.replace("%34", "4");
data = data.replace("%35", "5");
data = data.replace("%36", "6");
data = data.replace("%37", "7");
data = data.replace("%38", "8");
data = data.replace("%39", "9");
data = data.replace("%3A", ":");
data = data.replace("%3B", ";");
data = data.replace("%3C", "<");
data = data.replace("%3D", "=");
data = data.replace("%3E", ">");
data = data.replace("%3F", "?");
data = data.replace("%40", "@");
data = data.replace("%41", "A");
data = data.replace("%42", "B");
data = data.replace("%43", "C");
data = data.replace("%44", "D");
data = data.replace("%45", "E");
data = data.replace("%46", "F");
data = data.replace("%47", "G");
data = data.replace("%48", "H");
data = data.replace("%49", "I");
data = data.replace("%4A", "J");
data = data.replace("%4B", "K");
data = data.replace("%4C", "L");
data = data.replace("%4D", "M");
data = data.replace("%4E", "N");
data = data.replace("%4F", "O");
data = data.replace("%50", "P");
data = data.replace("%51", "Q");
data = data.replace("%52", "R");
data = data.replace("%53", "S");
data = data.replace("%54", "T");
data = data.replace("%55", "U");
data = data.replace("%56", "V");
data = data.replace("%57", "W");
data = data.replace("%58", "X");
data = data.replace("%59", "Y");
data = data.replace("%5A", "Z");
data = data.replace("%5B", "[");
data = data.replace("%5C", "\\");
data = data.replace("%5D", "]");
data = data.replace("%5E", "^");
data = data.replace("%5F", "_");
data = data.replace("%60", "`");
data = data.replace("%61", "a");
data = data.replace("%62", "b");
data = data.replace("%63", "c");
data = data.replace("%64", "d");
data = data.replace("%65", "e");
data = data.replace("%66", "f");
data = data.replace("%67", "g");
data = data.replace("%68", "h");
data = data.replace("%69", "i");
data = data.replace("%6A", "j");
data = data.replace("%6B", "k");
data = data.replace("%6C", "l");
data = data.replace("%6D", "m");
data = data.replace("%6E", "n");
data = data.replace("%6F", "o");
data = data.replace("%70", "p");
data = data.replace("%71", "q");
data = data.replace("%72", "r");
data = data.replace("%73", "s");
data = data.replace("%74", "t");
data = data.replace("%75", "u");
data = data.replace("%76", "v");
data = data.replace("%77", "w");
data = data.replace("%78", "x");
data = data.replace("%79", "y");
data = data.replace("%7A", "z");
data = data.replace("%7B", "{");
data = data.replace("%7C", "|");
data = data.replace("%7D", "}");
data = data.replace("%7E", "~");
data = data.replace("%80", "`");

return data;
}
 
Paul Clapham
Sheriff
Pie
Posts: 20966
31
Eclipse IDE Firefox Browser MySQL Database
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64708
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yup, no need to reinvent the wheel. And you should check out its friend URLEncoder.
 
Nick Foster
Greenhorn
Posts: 6
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.


Thank you! It was a a few months ago I wrote this and I was searching like a mad man over the internet and the best I found was a page showing the codes and their character representation. So I wrote this.

Thank you for the help.
 
Nick Foster
Greenhorn
Posts: 6
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay I implemented this, but I was getting an error and the image file was not being created properly.

This is how I use URLDecoder:
file = URLDecoder.decode(file, "UTF-8");



An excerpt from the binary that works using my own method:
myn+m8k8l5O+ms


An excerpt from the binary that does not work using URLDecoder:
myn m8k8l5O ms

It seems URLDecoder is placing a space where a plus sign should be. Any suggestions? Currently I just went back to what I know works, but I'd like to do this without it. Would it be okay to replace all spaces with a plus sign? I think I should avoid that.



 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64708
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In a URL the plus sign is the encoding for a space. Spaces are not legal in a URL and must themselves be encoded. URLDecoder is doing the correct thing.
 
Paul Clapham
Sheriff
Pie
Posts: 20966
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then it looks like the process which is "encoding" that binary data is using an incomplete version of URL-encoding. If you can't get that fixed then perhaps you could compensate for it by converting all "+" characters to the URL-encoded form of that character (which I think is "%2B" -- actually I know that because you said so in your original post) and then URL-decoding the result.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64708
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.
 
Nick Foster
Greenhorn
Posts: 6
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bear Bibeault wrote:Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.


I do not encode the image. It's done by OpenLayers. I use the OpenLayers API to get the canvas which is then sent to the server that processes the image binary.

The incoming data starts with "data:image/png;base64" Then it has the binary stuff.

@Paul, I guess that's the best solution. I know that URLDecoder is not faulty, I just wasn't sure if I used a different encoding than UTF-8 it might be better. But it will need to be UTF-8 at some point later.

Thanks Bear and Paul.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic