• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Soap turning non-ascii chars to garbage

 
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi al,,

I have a web application that composes a Soap message, via a web service, sends it to another application to be read. If I add non-ascii characters to the Soap message (e.g. umluts), the character turns into garbage before being sent.

Does anyone know what I need to do to the Soap message before I send it so that non-ascii charatcters are recognised?

Any suggestions welcome.

Thanks,
David
 
Ranch Hand
Posts: 88
Spring Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You should send your non acii characters or whole such strings as cdata to avoid such problems.
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Shashank Ag wrote:You should send your non acii characters or whole such strings as cdata to avoid such problems.



Does that mean [CDATA]?
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
From the internet, I have composed the following. When using UTF-8, the umlaut comes up garbaged. When using UTF-16, pretty much the entire message is garbled. Has anyone any ideas?
 
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi David,
Did you trying escaping those characters if they appear fewer?
For example (Note: no spaces between the characters in escape sequences, otherwise they would appear as they are after I post this message)
ù with & # 2 4 9;
à with & # 2 2 4;
é with & # 2 3 3;
ì with & # 2 3 6;
ø with & # 2 4 8;
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:Hi David,
Did you trying escaping those characters if they appear fewer?
For example (Note: no spaces between the characters in escape sequences, otherwise they would appear as they are after I post this message)
ù with & # 2 4 9;
à with & # 2 2 4;
é with & # 2 3 3;
ì with & # 2 3 6;
ø with & # 2 4 8;



Thanks for you reply Naren.

Using your method, would that mean I would have to create a list of all available non-ascii chars, then search for them in each Soap message and replace before sending? Seems a little cumbersume...
 
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How about using regex to search and replace ?
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No. Don't muck about with the data. Just send the data with the correct encoding. And also don't muck about with the data before you send it either. There's a good chance that it isn't SOAP's fault but the fault of some other code which screwed up the data before sending it. Here's some reading material for you:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Character Conversions from Browser to Database

A reintroduction to XML with an emphasis on character encoding
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Shashank Ag wrote:You should send your non acii characters or whole such strings as cdata to avoid such problems.



Can you explain how would use CDATA with JAXB? Is it supported?
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi David,

Seems a little cumbersume...


Well, I don't recommend to do it unless as I said you have to do only a fewer characters conversion.

What I can suggest you perhaps a simple approach is to covert your original xml string (with umlauts characters) bytes converted to "UTF-8" encoding(You need to know the original xml string encoding for coversion) using Java String methods. Then, write the bytes to SOAP message. Now, your non-UTF-8 characters are encoded/escaped (like in the example I provided) and safely trasmit. Once you read these bytes at the other end, you have to reverse this process of conversion to get the original String.
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:Hi David,

Seems a little cumbersume...


Well, I don't recommend to do it unless as I said you have to do only a fewer characters conversion.

What I can suggest you perhaps a simple approach is to covert your original xml string (with umlauts characters) bytes converted to "UTF-8" encoding(You need to know the original xml string encoding for coversion) using Java String methods. Then, write the bytes to SOAP message. Now, your non-UTF-8 characters are encoded/escaped (like in the example I provided) and safely trasmit. Once you read these bytes at the other end, you have to reverse this process of conversion to get the original String.



Thanks for your response Naren.

The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect). I have a java method that, just before the message hits the wire, reads the outgoing emssage as a string. The special character is always garbled.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi David,

The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect).


Interoperable web services complying with WS-I support only UTF-8 or UTF-16. So, setting explicitly to other encoding might fail to parse correctly during unmarshalling.

What is your original xml string character encoding? Did you try converting your original xml string to "UTF-8" encoding xml string before setting bytes to your SOAP message?
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:Hi David,

The encoding used on the outgoing message is UTF-8, the default (I have tried setting it explicitly but it has not effect).


Interoperable web services complying with WS-I support only UTF-8 or UTF-16. So, setting explicitly to other encoding might fail to parse correctly during unmarshalling.

What is your original xml string character encoding? Did you try converting your original xml string to "UTF-8" encoding xml string before setting bytes to your SOAP message?



Yes, as below:


From what I read, UTF-8 is the default. Makes no difference.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you haven't quite understood me. Never mind! Try to use this code snippet.
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:I think you haven't quite understood me. Never mind! Try to use this code snippet.

byte[] utf8Bytes=new String(latingString.getBytes(), isoCharset).getBytes(UTF_8);



My compiler is complaining about this line. Cannot find symbol.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Make sure you are using Java6!
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:Make sure you are using Java6!



Thanks.

I'm still not sure how that fits in to my problem.

I have a front end form where the user types personal details. Sometimes the details are entered with special characters. The contents of the form are sent to my java app via HTTP post, encoded in UTF-8. When the app receives the data, it creates SoapMessage with the personal details. The soap message is then sent out over the wire. The java method I have written to test the sent emssages, confirms that the characters are still garbled. It is difficult for me to dissect inside the soap message and change it, e.g. adding CDATA.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi David,
I have provided a generic solution (if it works ) and you have to fit that to your requirement. If you are SOAP message details are coming from a HTML posted form, then the encoding has to be changed on the front-end from UTF-8 to ISO-8859-1 to get correct characters to your application before sending the SOAP request. If using JSP, you can do it using <%@ page contentType="text/html; charset=ISO-8859-1" %>. You may have to use trial and error method in order to make it work for you.
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:Hi David,
I have provided a generic solution (if it works ) and you have to fit that to your requirement. If you are SOAP message details are coming from a HTML posted form, then the encoding has to be changed on the front-end from UTF-8 to ISO-8859-1 to get correct characters to your application before sending the SOAP request. If using JSP, you can do it using <%@ page contentType="text/html; charset=ISO-8859-1" %>. You may have to use trial and error method in order to make it work for you.



In the servlet that recieves the data from my form, I have the following line;

This is setting the encoding to UTF-8. I have changed it to both UTF-16 and ISO-8859-1 but neither have worked.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The above character set encoding is for redering the response content. Can you try this <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> on top of your JSP, which should send form data in ISO-8859-1 encoding?
 
David McWilliams
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Naren Chivukula wrote:The above character set encoding is for redering the response content. Can you try this <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> on top of your JSP, which should send form data in ISO-8859-1 encoding?



Thanks for your reply Naren. I have added the line above but it had no effect...
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All I can say at the moment is to ensure (by logging to a file, which can render ISO-8859-1 characters properly) you are getting proper xml data just before sending the request. If you managed to get it, apply the code snippet I provided and that should hopefully work. It's hard to understand what's going wrong even after applying encoding configurations in your JSP page.
 
Naren Chivukula
Ranch Hand
Posts: 577
Tomcat Server Notepad Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried this on my jsp and it'd work properly if I displayed back what I supplied in the form.
<%@ page language="java" pageEncoding="ISO-8859-1"%>
 
Maybe he went home and went to bed. And took this tiny ad with him:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic