*
The moose likes Web Services and the fly likes Soap turning non-ascii chars to garbage Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Web Services
Bookmark "Soap turning non-ascii chars to garbage" Watch "Soap turning non-ascii chars to garbage" New topic
Author

Soap turning non-ascii chars to garbage

David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Hi al,,

I have a web application that composes a Soap message, via a web service, sends it to another application to be read. If I add non-ascii characters to the Soap message (e.g. umluts), the character turns into garbage before being sent.

Does anyone know what I need to do to the Soap message before I send it so that non-ascii charatcters are recognised?

Any suggestions welcome.

Thanks,
David
Shashank Ag
Ranch Hand

Joined: Dec 22, 2009
Posts: 88

You should send your non acii characters or whole such strings as cdata to avoid such problems.


SCJP 91%, SCWCD 97%
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Shashank Ag wrote:You should send your non acii characters or whole such strings as cdata to avoid such problems.


Does that mean [CDATA]?
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
From the internet, I have composed the following. When using UTF-8, the umlaut comes up garbaged. When using UTF-16, pretty much the entire message is garbled. Has anyone any ideas?
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

Hi David,
Did you trying escaping those characters if they appear fewer?
For example (Note: no spaces between the characters in escape sequences, otherwise they would appear as they are after I post this message)
ù with & # 2 4 9;
à with & # 2 2 4;
é with & # 2 3 3;
ì with & # 2 3 6;
ø with & # 2 4 8;


Cheers,
Naren
(OCEEJBD6, SCWCD5, SCDJWS, SCJP1.4 and Oracle SQL 1Z0-051)
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:Hi David,
Did you trying escaping those characters if they appear fewer?
For example (Note: no spaces between the characters in escape sequences, otherwise they would appear as they are after I post this message)
ù with & # 2 4 9;
à with & # 2 2 4;
é with & # 2 3 3;
ì with & # 2 3 6;
ø with & # 2 4 8;


Thanks for you reply Naren.

Using your method, would that mean I would have to create a list of all available non-ascii chars, then search for them in each Soap message and replace before sending? Seems a little cumbersume...
Lingan Rajan
Ranch Hand

Joined: Jan 26, 2011
Posts: 30
How about using regex to search and replace ?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

No. Don't muck about with the data. Just send the data with the correct encoding. And also don't muck about with the data before you send it either. There's a good chance that it isn't SOAP's fault but the fault of some other code which screwed up the data before sending it. Here's some reading material for you:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Character Conversions from Browser to Database

A reintroduction to XML with an emphasis on character encoding
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Shashank Ag wrote:You should send your non acii characters or whole such strings as cdata to avoid such problems.


Can you explain how would use CDATA with JAXB? Is it supported?
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

Hi David,
Seems a little cumbersume...

Well, I don't recommend to do it unless as I said you have to do only a fewer characters conversion.

What I can suggest you perhaps a simple approach is to covert your original xml string (with umlauts characters) bytes converted to "UTF-8" encoding(You need to know the original xml string encoding for coversion) using Java String methods. Then, write the bytes to SOAP message. Now, your non-UTF-8 characters are encoded/escaped (like in the example I provided) and safely trasmit. Once you read these bytes at the other end, you have to reverse this process of conversion to get the original String.
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:Hi David,
Seems a little cumbersume...

Well, I don't recommend to do it unless as I said you have to do only a fewer characters conversion.

What I can suggest you perhaps a simple approach is to covert your original xml string (with umlauts characters) bytes converted to "UTF-8" encoding(You need to know the original xml string encoding for coversion) using Java String methods. Then, write the bytes to SOAP message. Now, your non-UTF-8 characters are encoded/escaped (like in the example I provided) and safely trasmit. Once you read these bytes at the other end, you have to reverse this process of conversion to get the original String.


Thanks for your response Naren.

The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect). I have a java method that, just before the message hits the wire, reads the outgoing emssage as a string. The special character is always garbled.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

Hi David,
The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect).

Interoperable web services complying with WS-I support only UTF-8 or UTF-16. So, setting explicitly to other encoding might fail to parse correctly during unmarshalling.

What is your original xml string character encoding? Did you try converting your original xml string to "UTF-8" encoding xml string before setting bytes to your SOAP message?
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:Hi David,
The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect).

Interoperable web services complying with WS-I support only UTF-8 or UTF-16. So, setting explicitly to other encoding might fail to parse correctly during unmarshalling.

What is your original xml string character encoding? Did you try converting your original xml string to "UTF-8" encoding xml string before setting bytes to your SOAP message?


Yes, as below:


From what I read, UTF-8 is the default. Makes no difference.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

I think you haven't quite understood me. Never mind! Try to use this code snippet.
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:I think you haven't quite understood me. Never mind! Try to use this code snippet.

byte[] utf8Bytes=new String(latingString.getBytes(), isoCharset).getBytes(UTF_8);


My compiler is complaining about this line. Cannot find symbol.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

Make sure you are using Java6!
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:Make sure you are using Java6!


Thanks.

I'm still not sure how that fits in to my problem.

I have a front end form where the user types personal details. Sometimes the details are entered with special characters. The contents of the form are sent to my java app via HTTP post, encoded in UTF-8. When the app receives the data, it creates SoapMessage with the personal details. The soap message is then sent out over the wire. The java method I have written to test the sent emssages, confirms that the characters are still garbled. It is difficult for me to dissect inside the soap message and change it, e.g. adding CDATA.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

Hi David,
I have provided a generic solution (if it works ) and you have to fit that to your requirement. If you are SOAP message details are coming from a HTML posted form, then the encoding has to be changed on the front-end from UTF-8 to ISO-8859-1 to get correct characters to your application before sending the SOAP request. If using JSP, you can do it using <%@ page contentType="text/html; charset=ISO-8859-1" %>. You may have to use trial and error method in order to make it work for you.
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:Hi David,
I have provided a generic solution (if it works ) and you have to fit that to your requirement. If you are SOAP message details are coming from a HTML posted form, then the encoding has to be changed on the front-end from UTF-8 to ISO-8859-1 to get correct characters to your application before sending the SOAP request. If using JSP, you can do it using <%@ page contentType="text/html; charset=ISO-8859-1" %>. You may have to use trial and error method in order to make it work for you.


In the servlet that recieves the data from my form, I have the following line;

This is setting the encoding to UTF-8. I have changed it to both UTF-16 and ISO-8859-1 but neither have worked.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

The above character set encoding is for redering the response content. Can you try this <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> on top of your JSP, which should send form data in ISO-8859-1 encoding?
David McWilliams
Ranch Hand

Joined: Mar 14, 2009
Posts: 73
Naren Chivukula wrote:The above character set encoding is for redering the response content. Can you try this <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> on top of your JSP, which should send form data in ISO-8859-1 encoding?


Thanks for your reply Naren. I have added the line above but it had no effect...
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

All I can say at the moment is to ensure (by logging to a file, which can render ISO-8859-1 characters properly) you are getting proper xml data just before sending the request. If you managed to get it, apply the code snippet I provided and that should hopefully work. It's hard to understand what's going wrong even after applying encoding configurations in your JSP page.
Naren Chivukula
Ranch Hand

Joined: Feb 03, 2004
Posts: 576

I tried this on my jsp and it'd work properly if I displayed back what I supplied in the form.
<%@ page language="java" pageEncoding="ISO-8859-1"%>
 
jQuery in Action, 2nd edition
 
subject: Soap turning non-ascii chars to garbage