Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Knute Snortum
  • Bear Bibeault
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Tim Holloway

Replacing character encodings with the proper character

 
Greenhorn
Posts: 6
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I was reading in bytes from an image based on an OpenLayers canvas.

After I read it in and got it in a String form, there would be a lot of encodings like %20, %7E, etc

So I created a method that very tediously checked for each encoding and if there replaces it with the proper character: data = data.replace("%20", " ");

This process is 95 lines long and I was wondering if there is a more elegant solution. I don't want to remove these I need to replace them so the image object will be read and properly created on the server side for further processing.

The process I have works, I would just like to know what other approaches there are.

This is the code as it currently exists:

private String replaceEncodings(String data) {

data = data.replace("%20", " ");
data = data.replace("%21", "!");
data = data.replace("%22", "\"");
data = data.replace("%23", "#");
data = data.replace("%24", "$");
data = data.replace("%25", "%");
data = data.replace("%26", "&");
data = data.replace("%27", "\'");
data = data.replace("%28", "(");
data = data.replace("%29", ")");
data = data.replace("%2A", "*");
data = data.replace("%2B", "+");
data = data.replace("%2C", ",");
data = data.replace("%2D", "-");
data = data.replace("%2E", ".");
data = data.replace("%2F", "/");
data = data.replace("%30", "0");
data = data.replace("%31", "1");
data = data.replace("%32", "2");
data = data.replace("%33", "3");
data = data.replace("%34", "4");
data = data.replace("%35", "5");
data = data.replace("%36", "6");
data = data.replace("%37", "7");
data = data.replace("%38", "8");
data = data.replace("%39", "9");
data = data.replace("%3A", ":");
data = data.replace("%3B", ";");
data = data.replace("%3C", "<");
data = data.replace("%3D", "=");
data = data.replace("%3E", ">");
data = data.replace("%3F", "?");
data = data.replace("%40", "@");
data = data.replace("%41", "A");
data = data.replace("%42", "B");
data = data.replace("%43", "C");
data = data.replace("%44", "D");
data = data.replace("%45", "E");
data = data.replace("%46", "F");
data = data.replace("%47", "G");
data = data.replace("%48", "H");
data = data.replace("%49", "I");
data = data.replace("%4A", "J");
data = data.replace("%4B", "K");
data = data.replace("%4C", "L");
data = data.replace("%4D", "M");
data = data.replace("%4E", "N");
data = data.replace("%4F", "O");
data = data.replace("%50", "P");
data = data.replace("%51", "Q");
data = data.replace("%52", "R");
data = data.replace("%53", "S");
data = data.replace("%54", "T");
data = data.replace("%55", "U");
data = data.replace("%56", "V");
data = data.replace("%57", "W");
data = data.replace("%58", "X");
data = data.replace("%59", "Y");
data = data.replace("%5A", "Z");
data = data.replace("%5B", "[");
data = data.replace("%5C", "\\");
data = data.replace("%5D", "]");
data = data.replace("%5E", "^");
data = data.replace("%5F", "_");
data = data.replace("%60", "`");
data = data.replace("%61", "a");
data = data.replace("%62", "b");
data = data.replace("%63", "c");
data = data.replace("%64", "d");
data = data.replace("%65", "e");
data = data.replace("%66", "f");
data = data.replace("%67", "g");
data = data.replace("%68", "h");
data = data.replace("%69", "i");
data = data.replace("%6A", "j");
data = data.replace("%6B", "k");
data = data.replace("%6C", "l");
data = data.replace("%6D", "m");
data = data.replace("%6E", "n");
data = data.replace("%6F", "o");
data = data.replace("%70", "p");
data = data.replace("%71", "q");
data = data.replace("%72", "r");
data = data.replace("%73", "s");
data = data.replace("%74", "t");
data = data.replace("%75", "u");
data = data.replace("%76", "v");
data = data.replace("%77", "w");
data = data.replace("%78", "x");
data = data.replace("%79", "y");
data = data.replace("%7A", "z");
data = data.replace("%7B", "{");
data = data.replace("%7C", "|");
data = data.replace("%7D", "}");
data = data.replace("%7E", "~");
data = data.replace("%80", "`");

return data;
}
 
Marshal
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.
 
Sheriff
Posts: 67269
170
Mac Mac OS X IntelliJ IDE jQuery Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yup, no need to reinvent the wheel. And you should check out its friend URLEncoder.
 
Nick Foster
Greenhorn
Posts: 6
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.



Thank you! It was a a few months ago I wrote this and I was searching like a mad man over the internet and the best I found was a page showing the codes and their character representation. So I wrote this.

Thank you for the help.
 
Nick Foster
Greenhorn
Posts: 6
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay I implemented this, but I was getting an error and the image file was not being created properly.

This is how I use URLDecoder:
file = URLDecoder.decode(file, "UTF-8");



An excerpt from the binary that works using my own method:
myn+m8k8l5O+ms


An excerpt from the binary that does not work using URLDecoder:
myn m8k8l5O ms

It seems URLDecoder is placing a space where a plus sign should be. Any suggestions? Currently I just went back to what I know works, but I'd like to do this without it. Would it be okay to replace all spaces with a plus sign? I think I should avoid that.



 
Bear Bibeault
Sheriff
Posts: 67269
170
Mac Mac OS X IntelliJ IDE jQuery Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In a URL the plus sign is the encoding for a space. Spaces are not legal in a URL and must themselves be encoded. URLDecoder is doing the correct thing.
 
Paul Clapham
Marshal
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then it looks like the process which is "encoding" that binary data is using an incomplete version of URL-encoding. If you can't get that fixed then perhaps you could compensate for it by converting all "+" characters to the URL-encoded form of that character (which I think is "%2B" -- actually I know that because you said so in your original post) and then URL-decoding the result.
 
Bear Bibeault
Sheriff
Posts: 67269
170
Mac Mac OS X IntelliJ IDE jQuery Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.
 
Nick Foster
Greenhorn
Posts: 6
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Bear Bibeault wrote:Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.



I do not encode the image. It's done by OpenLayers. I use the OpenLayers API to get the canvas which is then sent to the server that processes the image binary.

The incoming data starts with "data:image/png;base64" Then it has the binary stuff.

@Paul, I guess that's the best solution. I know that URLDecoder is not faulty, I just wasn't sure if I used a different encoding than UTF-8 it might be better. But it will need to be UTF-8 at some point later.

Thanks Bear and Paul.
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!