I'm attempting to parse an XML document that looks like this <?xml version="1.0" encoding="UTF-8"?> <Test> <test2>\u00e6\u2013\u2021\u00e5\u00ad\u2014\u00e3\u0192\u2021\u00e3\u0192\u00bc\u00e3\u201a\u00bf</test2> </Test> When I get the value for test2 - the string for it has \\u00e6, etc so instead of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this:
using Xerces. Anyone have any ideas as to why I am not getting the unicode characters? thanks
posted 11 years ago
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.