File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes XML special characters... how to encode them? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML special characters... how to encode them?" Watch "XML special characters... how to encode them?" New topic
Author

XML special characters... how to encode them?

dave taubler
Ranch Hand

Joined: May 15, 2001
Posts: 132
Hi,

I've been using XML to store data files for one of my Java apps for awhile now, but still once in a awhile run into parsing problems when a data file contains a non-standard-ASCII character (for example, an accented "e", or a letter with an umlaut.) I am using the Apache Xerces SAX parser. So at this point, I am wondering if anyone just knows the best practice for encoding such characters, assuming that the users of the program can conceivable enter any visible character into the file. Note that I want not only for the parser not to choke, but for the character to be displayed properly after the XML file is read back in.

For example, should I stick all user-entered text within CDATA[[ ]] tags? Should I use the &#XXX; format to encode non-ASCII characters? Should I do both? This seems like it should be a relatively easy issue, but for some reason it hasn't been for me.


Dave Taubler<br />Specializing in <a href="http://taubler.com/articles/" target="_blank" rel="nofollow">Java and Web Development</a>
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 852
I was recently looking into this issue. I saw this class, however I find it strange as common as this activitiy is that it isn't part of the JDK.

http://cvs.sourceforge.net/viewcvs.py/groboutils/projects/util-xml/sources/dev/net/sourceforge/groboutils/util/xml/v1/Attic/XMLUtil.java?rev=1.2&only_with_tag=v3&view=auto


http://www.jamonapi.com/ - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
dave taubler
Ranch Hand

Joined: May 15, 2001
Posts: 132
I agree (seems that something like that should be part of the JDK)... also, it just seems odd that unicode is such a problem within XML. Oh well.

Anyway, thanks for pointing me to that class. Have you used it much? Had good results with it?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Moving to XML...


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 852
I have not used the class. I just found it via google.

In the end I used code from a coworker and incorporated it into my open source project. I want to cross compare what this project did vs my implementation and do a little research before releasing it, but here is the link to my code which is pretty simple.

Look at the 'escape' method in the Utils class below. If it works for you feel free to use my jar or even pilfer the code . I'm not that familiar with the whole special characters concept in xml, so if anyone knows of a better way to do this, or place to get the code let me know.

http://cvs.sourceforge.net/viewcvs.py/fdsapi/fdsapi/Code/com/fdsapi/Utils.java?rev=1.10&view=auto

The main method of my com.fdsapi.Utils class has sample usage:



I created this method, so I could have a way to 'escape' all Strings within an array easily using my API. For example the following would escape all Strings in ANY array, and leave any other Object types unchanged.


[ April 10, 2005: Message edited by: steve souza ]
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 852
Here are the javadocs for a class within the Jakarta Commons package that escapes Strings.

http://jakarta.apache.org/commons/lang/apidocs/org/apache/commons/lang/StringEscapeUtils.html

Here is the code for StringEscapeUtils
http://cvs.apache.org/viewcvs.cgi/jakarta-commons/lang/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=1.30&view=markup

which in turn simply calls the Entities class:
http://cvs.apache.org/viewcvs.cgi/jakarta-commons/lang/src/java/org/apache/commons/lang/Entities.java?rev=1.19&view=markup

Download it from here:
http://jakarta.apache.org/commons/lang

The method follows. I haven't tried it though:

[ April 10, 2005: Message edited by: steve souza ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML special characters... how to encode them?
 
Similar Threads
What every developer should know about character encoding
images
Understanding Byte Data and Character Encoding
XML and BLOBs
ch\u0061r a = 'a';