This week's book giveaway is in the OCPJP forum. We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line! See this thread for details.
As that XML is served over HTTP the encoding specified in the Content-Type HTTP header is actually more important - it can override the encoding specified in the XML prolog.
See: XML on the Web Has Failed
Rashmi Anand wrote:Non english characters like ô ä are coming as ? ? in the xml.
That may indicate that the real encoding is us-ascii, not UTF-8.
So somewhere your UTF-8 content is being forced into us-ascii encoding so that the non-english characters are replaced with "?".
While us-ascii content can be labeled as UTF-8, us-ascii is not capable of expressing all the characters in UTF-8.
The characters ô and ä have Latin-1 decimal codes of 212 and 196 respectively. You can see it at ISO Latin-1 Character Set. It means that in your case these two characters are most likely encoded as either Latin-1 or UTF-8.
One thing you can try is to catch these characters as you receive them and pass them through a converter.
Please let us know.
Peer, it's disheartening to realize that in the one transmission we can have multiple encoding definitions – HTTP, XML. Does the mime wrapper introduce something as well?
William Butler Yeats: All life is a preparation for something that probably will never happen. Unless you make it happen.
Joined: Jul 09, 2001
It would be very nice to capture and see the entire HTTP response message.
To illustrate what Peer said, please have a look at SOAP HTTP Binding. In one example it shows the charset=utf-8 as part of the HTTP header.