File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes reading XML with UTF8 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of RabbitMQ in Depth this week in the Open Source forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "reading XML with UTF8" Watch "reading XML with UTF8" New topic

reading XML with UTF8

Greg T Robertson
Ranch Hand

Joined: Nov 18, 2003
Posts: 91
I'm attempting to parse an XML document that looks like this
<?xml version="1.0" encoding="UTF-8"?>
When I get the value for test2 - the string for it has \\u00e6, etc so instead
of being able to get the valid unicode characters i get '\' and then additional characters. I'm doing the parsing like this:

using Xerces.
Anyone have any ideas as to why I am not getting the unicode characters?
Lasse Koskela

Joined: Jan 23, 2002
Posts: 11962
The "\u" escaping approach is not part of the XML standard. If you want the XML parser to interpret those sequences as Unicode characters, you need to use the &#...; notation.

Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
wood burning stoves
subject: reading XML with UTF8