• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

filtering illegal characters in xml documents

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i have the following xml document:
consider the following xml file:
<?xml version="1.0"?>
<attribute>
<other1> > </other1>
<other2> < </other2>
<other3> & </other3>
</attribute>
i'm using jaxp to parse the document. i encountered the following errors when parsing the document:
C:\DOMEcho>java -classpath '.\;C:\DOMEcho;C:\lib\crimson.jar;C:\lib\jaxp.jar;C:\
lib\xalan.jar;.' DOMEcho attribute.xml
Fatal Error: URI=file:C:/DOMEcho/attribute.xml Line=4: The content beginning "<
" is not legal markup. Perhaps the " " ( character should be a letter.
my investigation reveals that the character say '>' (within <other1> > </other1> in invalid. any ideas of solving this? note that i cannot change '>' to its corresponding iso characters (xml document is generated by velocity- publishing framework).
any ideas in solving this so that i can parse my documents successfully. i have tried reading in the entire xml string and convert the illegal characters to its equivalent but it don't work. will appreciate if someone can suggest a solution (or even donate some codes for me).
 
Saloon Keeper
Posts: 28133
198
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can do it one of two ways.
1. Use the "escape" sequences, such as &amp;, &lt;
2. Wrap the items in a CDATA like so:

Once the XML parser reads in the info, the translation/escaping will have been done for you - This is true for both character entities and CDATA sequences.
[ July 12, 2002: Message edited by: Tim Holloway ]
 
Ervin Loh
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
the xml document is dynamically generated. as such i cannot put the CDATA section into the xml document.
i have been planning of writting a while loop that continuously parse the xml document and replace the illegal characters with its corresponding iso characters. can this work?
when it encounters SAXParseException, i'll call the method getColumnNumber method (to get the column where the character is).
 
When you have exhausted all possibilities, remember this: you haven't - Edison. Tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic