Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML special characters... how to encode them?

 
dave taubler
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I've been using XML to store data files for one of my Java apps for awhile now, but still once in a awhile run into parsing problems when a data file contains a non-standard-ASCII character (for example, an accented "e", or a letter with an umlaut.) I am using the Apache Xerces SAX parser. So at this point, I am wondering if anyone just knows the best practice for encoding such characters, assuming that the users of the program can conceivable enter any visible character into the file. Note that I want not only for the parser not to choke, but for the character to be displayed properly after the XML file is read back in.

For example, should I stick all user-entered text within CDATA[[ ]] tags? Should I use the &#XXX; format to encode non-ASCII characters? Should I do both? This seems like it should be a relatively easy issue, but for some reason it hasn't been for me.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was recently looking into this issue. I saw this class, however I find it strange as common as this activitiy is that it isn't part of the JDK.

http://cvs.sourceforge.net/viewcvs.py/groboutils/projects/util-xml/sources/dev/net/sourceforge/groboutils/util/xml/v1/Attic/XMLUtil.java?rev=1.2&only_with_tag=v3&view=auto
 
dave taubler
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree (seems that something like that should be part of the JDK)... also, it just seems odd that unicode is such a problem within XML. Oh well.

Anyway, thanks for pointing me to that class. Have you used it much? Had good results with it?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to XML...
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not used the class. I just found it via google.

In the end I used code from a coworker and incorporated it into my open source project. I want to cross compare what this project did vs my implementation and do a little research before releasing it, but here is the link to my code which is pretty simple.

Look at the 'escape' method in the Utils class below. If it works for you feel free to use my jar or even pilfer the code . I'm not that familiar with the whole special characters concept in xml, so if anyone knows of a better way to do this, or place to get the code let me know.

http://cvs.sourceforge.net/viewcvs.py/fdsapi/fdsapi/Code/com/fdsapi/Utils.java?rev=1.10&view=auto

The main method of my com.fdsapi.Utils class has sample usage:



I created this method, so I could have a way to 'escape' all Strings within an array easily using my API. For example the following would escape all Strings in ANY array, and leave any other Object types unchanged.


[ April 10, 2005: Message edited by: steve souza ]
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here are the javadocs for a class within the Jakarta Commons package that escapes Strings.

http://jakarta.apache.org/commons/lang/apidocs/org/apache/commons/lang/StringEscapeUtils.html

Here is the code for StringEscapeUtils
http://cvs.apache.org/viewcvs.cgi/jakarta-commons/lang/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=1.30&view=markup

which in turn simply calls the Entities class:
http://cvs.apache.org/viewcvs.cgi/jakarta-commons/lang/src/java/org/apache/commons/lang/Entities.java?rev=1.19&view=markup

Download it from here:
http://jakarta.apache.org/commons/lang

The method follows. I haven't tried it though:

[ April 10, 2005: Message edited by: steve souza ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic