filtering illegal characters in xml documents
filtering illegal characters in xml documents

Ervin Loh

i have the following xml document:
consider the following xml file:
<?xml version="1.0"?>
<other1> > </other1>
<other2> < </other2>
<other3> & </other3>
i'm using jaxp to parse the document. i encountered the following errors when parsing the document:
C:\DOMEcho>java -classpath '.\;C:\DOMEcho;C:\lib\crimson.jar;C:\lib\jaxp.jar;C:\
lib\xalan.jar;.' DOMEcho attribute.xml
Fatal Error: URI=file:C:/DOMEcho/attribute.xml Line=4: The content beginning "<
" is not legal markup. Perhaps the " " ( character should be a letter.
my investigation reveals that the character say '>' (within <other1> > </other1> in invalid. any ideas of solving this? note that i cannot change '>' to its corresponding iso characters (xml document is generated by velocity- publishing framework).
any ideas in solving this so that i can parse my documents successfully. i have tried reading in the entire xml string and convert the illegal characters to its equivalent but it don't work. will appreciate if someone can suggest a solution (or even donate some codes for me).
Tim Holloway
You can do it one of two ways.
1. Use the "escape" sequences, such as &amp;, &lt;
2. Wrap the items in a CDATA like so:

Once the XML parser reads in the info, the translation/escaping will have been done for you - This is true for both character entities and CDATA sequences.
Ervin Loh

the xml document is dynamically generated. as such i cannot put the CDATA section into the xml document.
i have been planning of writting a while loop that continuously parse the xml document and replace the illegal characters with its corresponding iso characters. can this work?
when it encounters SAXParseException, i'll call the method getColumnNumber method (to get the column where the character is).
