This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

CDATA Whitespace issue

 
Manohar Karamballi
Ranch Hand
Posts: 227
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All,

we are sending XML that consists of CDATA sections. Issue I am facing parser is generating nested CDATA sections for each newline feed it encountered within CDATA sections. FYR, I am attching CDATA input and CDATA as parsed by parser.

CDATA input
==============
<xmlData><![CDATA[<?xml version="1.0"?>
<form>
<type>UNKNOWN</type>
<code>1234</code>
</form>]]>
</xmlData>

CDATA as parsed by parser
===========================
<xmlData><![CDATA[<?xml version="1.0"?> ]]><![CDATA[
<form> ]]><![CDATA[
<type>UNKNOWN</type> ]]><![CDATA[
<code>1234</code> ]]><![CDATA[
</form>]]>
</xmlData>
 
wise owen
Ranch Hand
Posts: 2023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For the parser to distinguish between non-ignorable and ignorable whitespace, there must be a DTD or schema associated with the XML document, and it must be used to validate the document during parsing.
 
Peer Reynders
Bartender
Posts: 2968
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That�s probably normal behavior for XML parsers to prevent them from blowing up on humongous CDATA sections � which would explain why in StaX and DOM there is the setCoalescing method. See
Creating Parsers with JAXP and Referencing Enterprise Beans
void setCoalescing(boolean value) - If set, the parser combines all adjacent Text nodes and CDATA section nodes into a single Text node in the Document tree. If not set, CDATA sections may appear as separate nodes in the tree.

javax.xml.parsers.DocumentBuilderFactory
javax.xml.stream.XMLInputFactory
Of course this is only going to help you if you are using StAX or DOM and are going to use the CDATA as text right away.

Also read about that hiding HTML (and XML for that matter) in a CDATA section is discouraged:
The CDATASection Interface

Alternately you could simply encode the XML as text; some products already include utilities for that purpose
BEA's com.bea.document.XMLUtils encodeXML(xmlData:String):String and decodeXML(encoded:String):String methods
However it shouldn't be difficult to write your own.
[ February 10, 2006: Message edited by: Peer Reynders ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic