Win a copy of Rust Web Development this week in the Other Languages forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

CDATA Whitespace issue

 
Ranch Hand
Posts: 227
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All,

we are sending XML that consists of CDATA sections. Issue I am facing parser is generating nested CDATA sections for each newline feed it encountered within CDATA sections. FYR, I am attching CDATA input and CDATA as parsed by parser.

CDATA input
==============
<xmlData><![CDATA[<?xml version="1.0"?>
<form>
<type>UNKNOWN</type>
<code>1234</code>
</form>]]>
</xmlData>

CDATA as parsed by parser
===========================
<xmlData><![CDATA[<?xml version="1.0"?> ]]><![CDATA[
<form> ]]><![CDATA[
<type>UNKNOWN</type> ]]><![CDATA[
<code>1234</code> ]]><![CDATA[
</form>]]>
</xmlData>
 
Ranch Hand
Posts: 2023
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For the parser to distinguish between non-ignorable and ignorable whitespace, there must be a DTD or schema associated with the XML document, and it must be used to validate the document during parsing.
 
Bartender
Posts: 2968
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That�s probably normal behavior for XML parsers to prevent them from blowing up on humongous CDATA sections � which would explain why in StaX and DOM there is the setCoalescing method. See
Creating Parsers with JAXP and Referencing Enterprise Beans

void setCoalescing(boolean value) - If set, the parser combines all adjacent Text nodes and CDATA section nodes into a single Text node in the Document tree. If not set, CDATA sections may appear as separate nodes in the tree.


javax.xml.parsers.DocumentBuilderFactory
javax.xml.stream.XMLInputFactory
Of course this is only going to help you if you are using StAX or DOM and are going to use the CDATA as text right away.

Also read about that hiding HTML (and XML for that matter) in a CDATA section is discouraged:
The CDATASection Interface

Alternately you could simply encode the XML as text; some products already include utilities for that purpose
BEA's com.bea.document.XMLUtils encodeXML(xmlData:String):String and decodeXML(encoded:String):String methods
However it shouldn't be difficult to write your own.
[ February 10, 2006: Message edited by: Peer Reynders ]
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic