wood burning stoves 2.0*
The moose likes Web Services and the fly likes CDATA Whitespace issue Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Web Services
Bookmark "CDATA Whitespace issue" Watch "CDATA Whitespace issue" New topic
Author

CDATA Whitespace issue

Manohar Karamballi
Ranch Hand

Joined: Jul 17, 2001
Posts: 227
All,

we are sending XML that consists of CDATA sections. Issue I am facing parser is generating nested CDATA sections for each newline feed it encountered within CDATA sections. FYR, I am attching CDATA input and CDATA as parsed by parser.

CDATA input
==============
<xmlData><![CDATA[<?xml version="1.0"?>
<form>
<type>UNKNOWN</type>
<code>1234</code>
</form>]]>
</xmlData>

CDATA as parsed by parser
===========================
<xmlData><![CDATA[<?xml version="1.0"?> ]]><![CDATA[
<form> ]]><![CDATA[
<type>UNKNOWN</type> ]]><![CDATA[
<code>1234</code> ]]><![CDATA[
</form>]]>
</xmlData>
wise owen
Ranch Hand

Joined: Feb 02, 2006
Posts: 2023
For the parser to distinguish between non-ignorable and ignorable whitespace, there must be a DTD or schema associated with the XML document, and it must be used to validate the document during parsing.
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
That�s probably normal behavior for XML parsers to prevent them from blowing up on humongous CDATA sections � which would explain why in StaX and DOM there is the setCoalescing method. See
Creating Parsers with JAXP and Referencing Enterprise Beans
void setCoalescing(boolean value) - If set, the parser combines all adjacent Text nodes and CDATA section nodes into a single Text node in the Document tree. If not set, CDATA sections may appear as separate nodes in the tree.

javax.xml.parsers.DocumentBuilderFactory
javax.xml.stream.XMLInputFactory
Of course this is only going to help you if you are using StAX or DOM and are going to use the CDATA as text right away.

Also read about that hiding HTML (and XML for that matter) in a CDATA section is discouraged:
The CDATASection Interface

Alternately you could simply encode the XML as text; some products already include utilities for that purpose
BEA's com.bea.document.XMLUtils encodeXML(xmlData:String):String and decodeXML(encoded:String):String methods
However it shouldn't be difficult to write your own.
[ February 10, 2006: Message edited by: Peer Reynders ]
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: CDATA Whitespace issue
 
Similar Threads
Problem in Client Side Validation using Struts
Problem in Client side validation using Struts
CDATA and PCDATA
Xerces Sax not parsing a Unicode char
#PCDATA