Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Using characters like quote, ampersand in xml

 
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Currently I am using encoding="ISO-8859-1" for my xml file.To parse this xml I am using SAX parser.

Can anyone tell me how can I include characters like ", &, < in my xml tags.
Can UTF-8 be the solution. If yes how?
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"&", "<" and ">", for example, are illegal characters in XML regardless of the encoding you specify. You need to encode those special characters with entities like "&amp;", "&lt;" and "&gt;".
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am getting the data from the user and hence cannot restrict these caracters.

E.g. Label can contain - "Boeing" or say "Airbus"
Then the xml must contain ""Boeing""

Suggest how can I do this programmatically.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just got through handling a similar problem. You can use the java.util.regex classes to do bulk replacement of the problem characters. For example, I have a class with:


The replacement operation looks like:

where tmp is a single line of input text - all occurances of the recognized pattern get replaced with "&".
However, this all gets very tricky with & because the & appears in the encoding for other special characters. So I first change all "& lt;" to "@lt;", (etc), then handle the single &, then change all "@lt;" to "& lt;"
It goes surprisingly fast.
Note that I had to insert spaces in "& lt;" etc in this post to get it to render properly.
Bill

[ October 24, 2004: Message edited by: William Brogden ]
[ October 24, 2004: Message edited by: William Brogden ]
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William : This is ok as long as you are dealing with ASCII characters. Will the XML file be valid if I am gonna support other languages like German, or say Latin. :roll:

If possible please let me know how to write a XML format that is generalized to all language and character set barriers(Unicode).
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In theory, the Java XML libraries should be able to handle any valid UNICODE characters, but I don't have enough experience to advise you on what problems you would face.
Bill
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William, It's been great help for me.

I came up with the idea to support ASCII characters in UTF-8 format by appending &# with the ASCII value. It works fine for all keyboard entries(English US)
 
Bras cause cancer. And tiny ads:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic