File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

reading XML with UTF8

 
Greg T Robertson
Ranch Hand
Posts: 91
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm attempting to parse an XML document that looks like this
<?xml version="1.0" encoding="UTF-8"?>
<Test>
<test2>\u00e6\u2013\u2021\u00e5\u00ad\u2014\u00e3\u0192\u2021\u00e3\u0192\u00bc\u00e3\u201a\u00bf</test2>
</Test>
When I get the value for test2 - the string for it has \\u00e6, etc so instead
of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this:

using Xerces.
Anyone have any ideas as to why I am not getting the unicode characters?
thanks
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic