Meaningless Drivel is fun!
The moose likes XML and Related Technologies and the fly likes reading XML with UTF8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "reading XML with UTF8" Watch "reading XML with UTF8" New topic

reading XML with UTF8

Greg T Robertson
Ranch Hand

Joined: Nov 18, 2003
Posts: 91
I'm attempting to parse an XML document that looks like this
<?xml version="1.0" encoding="UTF-8"?>
When I get the value for test2 - the string for it has \\u00e6, etc so instead
of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this:

using Xerces.
Anyone have any ideas as to why I am not getting the unicode characters?
Lasse Koskela

Joined: Jan 23, 2002
Posts: 11962
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.

Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
I agree. Here's the link:
subject: reading XML with UTF8
It's not a secret anymore!