aspose file tools*
The moose likes Struts and the fly likes filtering illegal characters in xml documents Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Frameworks » Struts
Bookmark "filtering illegal characters in xml documents " Watch "filtering illegal characters in xml documents " New topic

filtering illegal characters in xml documents

Ervin Loh

Joined: Feb 15, 2002
Posts: 6
i have the following xml document:
consider the following xml file:
<?xml version="1.0"?>
<other1> > </other1>
<other2> < </other2>
<other3> & </other3>
i'm using jaxp to parse the document. i encountered the following errors when parsing the document:
C:\DOMEcho>java -classpath '.\;C:\DOMEcho;C:\lib\crimson.jar;C:\lib\jaxp.jar;C:\
lib\xalan.jar;.' DOMEcho attribute.xml
Fatal Error: URI=file:C:/DOMEcho/attribute.xml Line=4: The content beginning "<
" is not legal markup. Perhaps the " " ( character should be a letter.
my investigation reveals that the character say '>' (within <other1> > </other1> in invalid. any ideas of solving this? note that i cannot change '>' to its corresponding iso characters (xml document is generated by velocity- publishing framework).
any ideas in solving this so that i can parse my documents successfully. i have tried reading in the entire xml string and convert the illegal characters to its equivalent but it don't work. will appreciate if someone can suggest a solution (or even donate some codes for me).
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 16145

You can do it one of two ways.
1. Use the "escape" sequences, such as &amp;, &lt;
2. Wrap the items in a CDATA like so:

Once the XML parser reads in the info, the translation/escaping will have been done for you - This is true for both character entities and CDATA sequences.
[ July 12, 2002: Message edited by: Tim Holloway ]

Customer surveys are for companies who didn't pay proper attention to begin with.
Ervin Loh

Joined: Feb 15, 2002
Posts: 6
the xml document is dynamically generated. as such i cannot put the CDATA section into the xml document.
i have been planning of writting a while loop that continuously parse the xml document and replace the illegal characters with its corresponding iso characters. can this work?
when it encounters SAXParseException, i'll call the method getColumnNumber method (to get the column where the character is).
I agree. Here's the link:
subject: filtering illegal characters in xml documents