I've been using XML to store data files for one of my Java apps for awhile now, but still once in a awhile run into parsing problems when a data file contains a non-standard-ASCII character (for example, an accented "e", or a letter with an umlaut.) I am using the Apache Xerces SAX parser. So at this point, I am wondering if anyone just knows the best practice for encoding such characters, assuming that the users of the program can conceivable enter any visible character into the file. Note that I want not only for the parser not to choke, but for the character to be displayed properly after the XML file is read back in.
For example, should I stick all user-entered text within CDATA[[ ]] tags? Should I use the XX; format to encode non-ASCII characters? Should I do both? This seems like it should be a relatively easy issue, but for some reason it hasn't been for me.
Dave Taubler<br />Specializing in <a href="http://taubler.com/articles/" target="_blank" rel="nofollow">Java and Web Development</a>
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Joined: Jun 26, 2002
I have not used the class. I just found it via google.
In the end I used code from a coworker and incorporated it into my open source project. I want to cross compare what this project did vs my implementation and do a little research before releasing it, but here is the link to my code which is pretty simple.
Look at the 'escape' method in the Utils class below. If it works for you feel free to use my jar or even pilfer the code . I'm not that familiar with the whole special characters concept in xml, so if anyone knows of a better way to do this, or place to get the code let me know.
The main method of my com.fdsapi.Utils class has sample usage:
I created this method, so I could have a way to 'escape' all Strings within an array easily using my API. For example the following would escape all Strings in ANY array, and leave any other Object types unchanged.
[ April 10, 2005: Message edited by: steve souza ]
Joined: Jun 26, 2002
Here are the javadocs for a class within the Jakarta Commons package that escapes Strings.