| Author |
reading XML with UTF8
|
Greg T Robertson
Ranch Hand
Joined: Nov 18, 2003
Posts: 91
|
|
I'm attempting to parse an XML document that looks like this <?xml version="1.0" encoding="UTF-8"?> <Test> <test2>\u00e6\u2013\u2021\u00e5\u00ad\u2014\u00e3\u0192\u2021\u00e3\u0192\u00bc\u00e3\u201a\u00bf</test2> </Test> When I get the value for test2 - the string for it has \\u00e6, etc so instead of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this: using Xerces. Anyone have any ideas as to why I am not getting the unicode characters? thanks
|
 |
Lasse Koskela
author
Sheriff
Joined: Jan 23, 2002
Posts: 11962
|
|
|
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.
|
Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
|
 |
 |
|
|
subject: reading XML with UTF8
|
|
|