• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Using characters like quote, ampersand in xml

 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Currently I am using encoding="ISO-8859-1" for my xml file.To parse this xml I am using SAX parser.

Can anyone tell me how can I include characters like ", &, < in my xml tags.
Can UTF-8 be the solution. If yes how?
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"&", "<" and ">", for example, are illegal characters in XML regardless of the encoding you specify. You need to encode those special characters with entities like "&amp;", "&lt;" and "&gt;".
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am getting the data from the user and hence cannot restrict these caracters.

E.g. Label can contain - "Boeing" or say "Airbus"
Then the xml must contain ""Boeing""

Suggest how can I do this programmatically.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13055
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just got through handling a similar problem. You can use the java.util.regex classes to do bulk replacement of the problem characters. For example, I have a class with:


The replacement operation looks like:

where tmp is a single line of input text - all occurances of the recognized pattern get replaced with "&".
However, this all gets very tricky with & because the & appears in the encoding for other special characters. So I first change all "& lt;" to "@lt;", (etc), then handle the single &, then change all "@lt;" to "& lt;"
It goes surprisingly fast.
Note that I had to insert spaces in "& lt;" etc in this post to get it to render properly.
Bill

[ October 24, 2004: Message edited by: William Brogden ]
[ October 24, 2004: Message edited by: William Brogden ]
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William : This is ok as long as you are dealing with ASCII characters. Will the XML file be valid if I am gonna support other languages like German, or say Latin. :roll:

If possible please let me know how to write a XML format that is generalized to all language and character set barriers(Unicode).
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13055
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In theory, the Java XML libraries should be able to handle any valid UNICODE characters, but I don't have enough experience to advise you on what problems you would face.
Bill
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William, It's been great help for me.

I came up with the idea to support ASCII characters in UTF-8 format by appending &# with the ASCII value. It works fine for all keyboard entries(English US)
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic