I am using DOM parser in my application to cretae and parse XML files.
My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc. (Giving a space after &, other javaranch will convert it to equvivalent ) Is there any otherway to handle these characters?
Also where can I find a cumulative list of all special XML charaters, so that I dont miss anything.
Thanks! [ April 05, 2007: Message edited by: P Lavti ]
-P Lavti<br />SCJP 5.0 (88%)
Joined: Mar 22, 2005
The XML spec contains the list of special characters. It's & , < , >, ' and ".
Ampersand and left angle bracket must be escaped, right angle bracket may be escaped, and apostrophe and double quotes are only relevant if used in attribute values.
If the "</ddd" in your example is supposed to be a text node, then the "<" character has to be escaped (as Ulf Dittmer already explained). It makes no difference what characters appear elsewhere in the text node. It's a very simple rule, there is no need to make it more complicated.
Originally posted by P Lavti: I am using DOM parser in my application to cretae and parse XML files.
My XML might contain invalid XML charaters such as &, <, > etc. SO I need to special handling. For that I am converting the charaters into & amp; & lt;, & gt; etc.
I think some clarification is required here. First of all, you don't use a DOM parser to create an XML file. You use it to parse and XML file. A parser (DOM or otherwise) reads an XML document and translates it into some internal format. Now, in the XML document those escaping rules are in effect, so that a < character must be represented as < in a text node and so on. But the parser will "unescape" that character so that in your internal form (e.g. a DOM) you will just see the < character. You do not have to do that yourself.
Likewise when you use a serializer to convert the internal form to an XML document, it will do that escaping for you. (It is common to use a javax.xml.transform.Transformer to do that.) The only time you need to apply that escaping rule is when you are creating XML by hand via a text editor, and when you are using ordinary Java I/O to write out an XML document.
Joined: Mar 27, 2007
I am using DOM API's to create the XML. Below is the portion of the code I am using. I am not sure if I can't say cretaing XML using DOM.
Now tell me whether I need to handling of invalid XML chars, or will it be taken care by DOM API's?.
You are not creating any XML in that code. You are creating and updating a DOM object, adding nodes that can represent XML. Note that they only represent XML. A DOM object is not an XML document.
And no, you don't need to worry about escaping ampersands in your text nodes when you put them in a DOM object. I think I said that in my earlier post, didn't I?
As for invalid XML characters, that's a new topic for this thread. Normally if you are just putting text into your text nodes, you won't encounter invalid characters. You only run into problems there when you have characters that aren't normal text. Don't worry about that just yet.