aspose file tools*
The moose likes Java in General and the fly likes Replacing character encodings with the proper character Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Replacing character encodings with the proper character" Watch "Replacing character encodings with the proper character" New topic
Author

Replacing character encodings with the proper character

Nick Foster
Greenhorn

Joined: Feb 19, 2013
Posts: 6


I was reading in bytes from an image based on an OpenLayers canvas.

After I read it in and got it in a String form, there would be a lot of encodings like %20, %7E, etc

So I created a method that very tediously checked for each encoding and if there replaces it with the proper character: data = data.replace("%20", " ");

This process is 95 lines long and I was wondering if there is a more elegant solution. I don't want to remove these I need to replace them so the image object will be read and properly created on the server side for further processing.

The process I have works, I would just like to know what other approaches there are.

This is the code as it currently exists:

private String replaceEncodings(String data) {

data = data.replace("%20", " ");
data = data.replace("%21", "!");
data = data.replace("%22", "\"");
data = data.replace("%23", "#");
data = data.replace("%24", "$");
data = data.replace("%25", "%");
data = data.replace("%26", "&");
data = data.replace("%27", "\'");
data = data.replace("%28", "(");
data = data.replace("%29", ")");
data = data.replace("%2A", "*");
data = data.replace("%2B", "+");
data = data.replace("%2C", ",");
data = data.replace("%2D", "-");
data = data.replace("%2E", ".");
data = data.replace("%2F", "/");
data = data.replace("%30", "0");
data = data.replace("%31", "1");
data = data.replace("%32", "2");
data = data.replace("%33", "3");
data = data.replace("%34", "4");
data = data.replace("%35", "5");
data = data.replace("%36", "6");
data = data.replace("%37", "7");
data = data.replace("%38", "8");
data = data.replace("%39", "9");
data = data.replace("%3A", ":");
data = data.replace("%3B", ";");
data = data.replace("%3C", "<");
data = data.replace("%3D", "=");
data = data.replace("%3E", ">");
data = data.replace("%3F", "?");
data = data.replace("%40", "@");
data = data.replace("%41", "A");
data = data.replace("%42", "B");
data = data.replace("%43", "C");
data = data.replace("%44", "D");
data = data.replace("%45", "E");
data = data.replace("%46", "F");
data = data.replace("%47", "G");
data = data.replace("%48", "H");
data = data.replace("%49", "I");
data = data.replace("%4A", "J");
data = data.replace("%4B", "K");
data = data.replace("%4C", "L");
data = data.replace("%4D", "M");
data = data.replace("%4E", "N");
data = data.replace("%4F", "O");
data = data.replace("%50", "P");
data = data.replace("%51", "Q");
data = data.replace("%52", "R");
data = data.replace("%53", "S");
data = data.replace("%54", "T");
data = data.replace("%55", "U");
data = data.replace("%56", "V");
data = data.replace("%57", "W");
data = data.replace("%58", "X");
data = data.replace("%59", "Y");
data = data.replace("%5A", "Z");
data = data.replace("%5B", "[");
data = data.replace("%5C", "\\");
data = data.replace("%5D", "]");
data = data.replace("%5E", "^");
data = data.replace("%5F", "_");
data = data.replace("%60", "`");
data = data.replace("%61", "a");
data = data.replace("%62", "b");
data = data.replace("%63", "c");
data = data.replace("%64", "d");
data = data.replace("%65", "e");
data = data.replace("%66", "f");
data = data.replace("%67", "g");
data = data.replace("%68", "h");
data = data.replace("%69", "i");
data = data.replace("%6A", "j");
data = data.replace("%6B", "k");
data = data.replace("%6C", "l");
data = data.replace("%6D", "m");
data = data.replace("%6E", "n");
data = data.replace("%6F", "o");
data = data.replace("%70", "p");
data = data.replace("%71", "q");
data = data.replace("%72", "r");
data = data.replace("%73", "s");
data = data.replace("%74", "t");
data = data.replace("%75", "u");
data = data.replace("%76", "v");
data = data.replace("%77", "w");
data = data.replace("%78", "x");
data = data.replace("%79", "y");
data = data.replace("%7A", "z");
data = data.replace("%7B", "{");
data = data.replace("%7C", "|");
data = data.replace("%7D", "}");
data = data.replace("%7E", "~");
data = data.replace("%80", "`");

return data;
}
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18675
    
    8

It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61437
    
  67

Yup, no need to reinvent the wheel. And you should check out its friend URLEncoder.


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Nick Foster
Greenhorn

Joined: Feb 19, 2013
Posts: 6

Paul Clapham wrote:It looks to me like you're dealing with URL-encoded data. In which case the URLDecoder class (follow the link) is what you're looking for.


Thank you! It was a a few months ago I wrote this and I was searching like a mad man over the internet and the best I found was a page showing the codes and their character representation. So I wrote this.

Thank you for the help.
Nick Foster
Greenhorn

Joined: Feb 19, 2013
Posts: 6

Okay I implemented this, but I was getting an error and the image file was not being created properly.

This is how I use URLDecoder:
file = URLDecoder.decode(file, "UTF-8");



An excerpt from the binary that works using my own method:
myn+m8k8l5O+ms


An excerpt from the binary that does not work using URLDecoder:
myn m8k8l5O ms

It seems URLDecoder is placing a space where a plus sign should be. Any suggestions? Currently I just went back to what I know works, but I'd like to do this without it. Would it be okay to replace all spaces with a plus sign? I think I should avoid that.



Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61437
    
  67

In a URL the plus sign is the encoding for a space. Spaces are not legal in a URL and must themselves be encoded. URLDecoder is doing the correct thing.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18675
    
    8

Then it looks like the process which is "encoding" that binary data is using an incomplete version of URL-encoding. If you can't get that fixed then perhaps you could compensate for it by converting all "+" characters to the URL-encoded form of that character (which I think is "%2B" -- actually I know that because you said so in your original post) and then URL-decoding the result.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61437
    
  67

Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.
Nick Foster
Greenhorn

Joined: Feb 19, 2013
Posts: 6

Bear Bibeault wrote:Yeah, if the encoding process is broken, it behooves you to fix that rather than trying to fiddle the decoding of a bad encoding.


I do not encode the image. It's done by OpenLayers. I use the OpenLayers API to get the canvas which is then sent to the server that processes the image binary.

The incoming data starts with "data:image/png;base64" Then it has the binary stuff.

@Paul, I guess that's the best solution. I know that URLDecoder is not faulty, I just wasn't sure if I used a different encoding than UTF-8 it might be better. But it will need to be UTF-8 at some point later.

Thanks Bear and Paul.
 
 
subject: Replacing character encodings with the proper character