aspose file tools*
The moose likes XML and Related Technologies and the fly likes Invalid/Special XML characters such as &, ', Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Invalid/Special XML characters such as &, Watch "Invalid/Special XML characters such as &, New topic
Author

Invalid/Special XML characters such as &, ', "

P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
Hi,

I am using DOM parser in my application to cretae and parse XML files.

My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc. (Giving a space after &, other javaranch will convert it to equvivalent ) Is there any otherway to handle these characters?

Also where can I find a cumulative list of all special XML charaters, so that I dont miss anything.

Thanks!
[ April 05, 2007: Message edited by: P Lavti ]

-P Lavti<br />SCJP 5.0 (88%)
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41885
    
  63
The XML spec contains the list of special characters. It's & , < , >, ' and ".

Ampersand and left angle bracket must be escaped, right angle bracket may be escaped, and apostrophe and double quotes are only relevant if used in attribute values.


Ping & DNS - my free Android networking tools app
P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
What about "<" + "/" + Letter For example <name></ddd</name>

Is it also considered as special sequence while XML parsing?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

If the "</ddd" in your example is supposed to be a text node, then the "<" character has to be escaped (as Ulf Dittmer already explained). It makes no difference what characters appear elsewhere in the text node. It's a very simple rule, there is no need to make it more complicated.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Originally posted by P Lavti:
I am using DOM parser in my application to cretae and parse XML files.

My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc.
I think some clarification is required here. First of all, you don't use a DOM parser to create an XML file. You use it to parse and XML file. A parser (DOM or otherwise) reads an XML document and translates it into some internal format. Now, in the XML document those escaping rules are in effect, so that a < character must be represented as &lt; in a text node and so on. But the parser will "unescape" that character so that in your internal form (e.g. a DOM) you will just see the < character. You do not have to do that yourself.

Likewise when you use a serializer to convert the internal form to an XML document, it will do that escaping for you. (It is common to use a javax.xml.transform.Transformer to do that.) The only time you need to apply that escaping rule is when you are creating XML by hand via a text editor, and when you are using ordinary Java I/O to write out an XML document.
P Lavti
Ranch Hand

Joined: Mar 27, 2007
Posts: 65
Hi Paul,

I am using DOM API's to create the XML. Below is the portion of the code I am using. I am not sure if I can't say cretaing XML using DOM.



Now tell me whether I need to handling of invalid XML chars, or will it be taken care by DOM API's?.

Thanks!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

You are not creating any XML in that code. You are creating and updating a DOM object, adding nodes that can represent XML. Note that they only represent XML. A DOM object is not an XML document.

And no, you don't need to worry about escaping ampersands in your text nodes when you put them in a DOM object. I think I said that in my earlier post, didn't I?

As for invalid XML characters, that's a new topic for this thread. Normally if you are just putting text into your text nodes, you won't encounter invalid characters. You only run into problems there when you have characters that aren't normal text. Don't worry about that just yet.
 
jQuery in Action, 2nd edition
 
subject: Invalid/Special XML characters such as &, ', "