• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Invalid/Special XML characters such as &, ', "

 
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am using DOM parser in my application to cretae and parse XML files.

My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc. (Giving a space after &, other javaranch will convert it to equvivalent ) Is there any otherway to handle these characters?

Also where can I find a cumulative list of all special XML charaters, so that I dont miss anything.

Thanks!
[ April 05, 2007: Message edited by: P Lavti ]
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The XML spec contains the list of special characters. It's & , < , >, ' and ".

Ampersand and left angle bracket must be escaped, right angle bracket may be escaped, and apostrophe and double quotes are only relevant if used in attribute values.
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What about "<" + "/" + Letter For example <name></ddd</name>

Is it also considered as special sequence while XML parsing?
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If the "</ddd" in your example is supposed to be a text node, then the "<" character has to be escaped (as Ulf Dittmer already explained). It makes no difference what characters appear elsewhere in the text node. It's a very simple rule, there is no need to make it more complicated.
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by P Lavti:
I am using DOM parser in my application to cretae and parse XML files.

My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc.

I think some clarification is required here. First of all, you don't use a DOM parser to create an XML file. You use it to parse and XML file. A parser (DOM or otherwise) reads an XML document and translates it into some internal format. Now, in the XML document those escaping rules are in effect, so that a < character must be represented as &lt; in a text node and so on. But the parser will "unescape" that character so that in your internal form (e.g. a DOM) you will just see the < character. You do not have to do that yourself.

Likewise when you use a serializer to convert the internal form to an XML document, it will do that escaping for you. (It is common to use a javax.xml.transform.Transformer to do that.) The only time you need to apply that escaping rule is when you are creating XML by hand via a text editor, and when you are using ordinary Java I/O to write out an XML document.
 
P Lavti
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul,

I am using DOM API's to create the XML. Below is the portion of the code I am using. I am not sure if I can't say cretaing XML using DOM.



Now tell me whether I need to handling of invalid XML chars, or will it be taken care by DOM API's?.

Thanks!
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are not creating any XML in that code. You are creating and updating a DOM object, adding nodes that can represent XML. Note that they only represent XML. A DOM object is not an XML document.

And no, you don't need to worry about escaping ampersands in your text nodes when you put them in a DOM object. I think I said that in my earlier post, didn't I?

As for invalid XML characters, that's a new topic for this thread. Normally if you are just putting text into your text nodes, you won't encounter invalid characters. You only run into problems there when you have characters that aren't normal text. Don't worry about that just yet.
 
Or we might never have existed at all. Freaky. So we should cherish everything. Even this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic