wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes reading XML with UTF8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "reading XML with UTF8" Watch "reading XML with UTF8" New topic
Author

reading XML with UTF8

Greg T Robertson
Ranch Hand

Joined: Nov 18, 2003
Posts: 91
I'm attempting to parse an XML document that looks like this
<?xml version="1.0" encoding="UTF-8"?>
<Test>
<test2>\u00e6\u2013\u2021\u00e5\u00ad\u2014\u00e3\u0192\u2021\u00e3\u0192\u00bc\u00e3\u201a\u00bf</test2>
</Test>
When I get the value for test2 - the string for it has \\u00e6, etc so instead
of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this:

using Xerces.
Anyone have any ideas as to why I am not getting the unicode characters?
thanks
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: reading XML with UTF8
 
Similar Threads
where is org/w3c/dom/ElementTraversal
How to pasrse XML string quickly in java
session variable
parsing french characters
getting the value of a node