aspose file tools*
The moose likes XML and Related Technologies and the fly likes Using characters like quote, ampersand in xml Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Using characters like quote, ampersand in xml" Watch "Using characters like quote, ampersand in xml" New topic
Author

Using characters like quote, ampersand in xml

Pradeep Kadambar
Ranch Hand

Joined: Oct 18, 2004
Posts: 148
Currently I am using encoding="ISO-8859-1" for my xml file.To parse this xml I am using SAX parser.

Can anyone tell me how can I include characters like ", &, < in my xml tags.
Can UTF-8 be the solution. If yes how?
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
"&", "<" and ">", for example, are illegal characters in XML regardless of the encoding you specify. You need to encode those special characters with entities like "&amp;", "&lt;" and "&gt;".


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Pradeep Kadambar
Ranch Hand

Joined: Oct 18, 2004
Posts: 148
I am getting the data from the user and hence cannot restrict these caracters.

E.g. Label can contain - "Boeing" or say "Airbus"
Then the xml must contain ""Boeing""

Suggest how can I do this programmatically.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12822
    
    5
I just got through handling a similar problem. You can use the java.util.regex classes to do bulk replacement of the problem characters. For example, I have a class with:


The replacement operation looks like:

where tmp is a single line of input text - all occurances of the recognized pattern get replaced with "&".
However, this all gets very tricky with & because the & appears in the encoding for other special characters. So I first change all "& lt;" to "@lt;", (etc), then handle the single &, then change all "@lt;" to "& lt;"
It goes surprisingly fast.
Note that I had to insert spaces in "& lt;" etc in this post to get it to render properly.
Bill

[ October 24, 2004: Message edited by: William Brogden ]
[ October 24, 2004: Message edited by: William Brogden ]
Pradeep Kadambar
Ranch Hand

Joined: Oct 18, 2004
Posts: 148
Thanks William : This is ok as long as you are dealing with ASCII characters. Will the XML file be valid if I am gonna support other languages like German, or say Latin. :roll:

If possible please let me know how to write a XML format that is generalized to all language and character set barriers(Unicode).
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12822
    
    5
In theory, the Java XML libraries should be able to handle any valid UNICODE characters, but I don't have enough experience to advise you on what problems you would face.
Bill
Pradeep Kadambar
Ranch Hand

Joined: Oct 18, 2004
Posts: 148
Thanks William, It's been great help for me.

I came up with the idea to support ASCII characters in UTF-8 format by appending &# with the ASCII value. It works fine for all keyboard entries(English US)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Using characters like quote, ampersand in xml