Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

filtering illegal characters in xml documents

 
Ervin Loh
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i have the following xml document:
consider the following xml file:
<?xml version="1.0"?>
<attribute>
<other1> > </other1>
<other2> < </other2>
<other3> & </other3>
</attribute>
i'm using jaxp to parse the document. i encountered the following errors when parsing the document:
C:\DOMEcho>java -classpath '.\;C:\DOMEcho;C:\lib\crimson.jar;C:\lib\jaxp.jar;C:\
lib\xalan.jar;.' DOMEcho attribute.xml
Fatal Error: URI=file:C:/DOMEcho/attribute.xml Line=4: The content beginning "<
" is not legal markup. Perhaps the " " ( character should be a letter.
my investigation reveals that the character say '>' (within <other1> > </other1> in invalid. any ideas of solving this? note that i cannot change '>' to its corresponding iso characters (xml document is generated by velocity- publishing framework).
any ideas in solving this so that i can parse my documents successfully. i have tried reading in the entire xml string and convert the illegal characters to its equivalent but it don't work. will appreciate if someone can suggest a solution (or even donate some codes for me).
 
Tim Holloway
Saloon Keeper
Pie
Posts: 17987
47
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can do it one of two ways.
1. Use the "escape" sequences, such as &amp;, &lt;
2. Wrap the items in a CDATA like so:

Once the XML parser reads in the info, the translation/escaping will have been done for you - This is true for both character entities and CDATA sequences.
[ July 12, 2002: Message edited by: Tim Holloway ]
 
Ervin Loh
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the xml document is dynamically generated. as such i cannot put the CDATA section into the xml document.
i have been planning of writting a while loop that continuously parse the xml document and replace the illegal characters with its corresponding iso characters. can this work?
when it encounters SAXParseException, i'll call the method getColumnNumber method (to get the column where the character is).
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic