File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Extraction of CData Part and later its embedding Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Extraction of CData Part and later its embedding" Watch "Extraction of CData Part and later its embedding" New topic

Extraction of CData Part and later its embedding

Norman Meister
Ranch Hand

Joined: Jul 03, 2009
Posts: 48
Hi All,
Could someone please help me figure out how I can extract from my xml the CData part, which is also an xml. Once I have the CData xml, I will then apply some xslt script on it and then embed the processed xml back in the CData area.

For example:

I would like to extract the
, process it with my xslt script and then embed it back in the original xml, e.g. my desired out:

The masking
above is done by my xslt script.

What I need to know is how I can extract the CData part and embedd it back.

Second important thing is that the structure of my input xml is not fix. The input xml could vary in structure, i.e. it might or might not have CData, incase if the CData is not there, the xslt script should simply be applied and masking is performed if required. The masking logic is handled entirely in the xslt script. But in-case, if the CData is present, then the extraction->xslt script->embed logic is applied.

A pure xslt based solution is not feseable, but some java and xslt solution.

So in short, I am looking for a kind osolution as to how I can check if CData is present, and if yes, extract its contents, then xslt and then embedding it back in the original xml.

Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

Well, as far as the parser is concerned, your CDATA section is just treated as a string. So you extract it by the trivial operation of getting a text node. Whether you can tell whether it's a CDATA section or not, I don't know, but you should ignore that requirement because CDATA is just a convenience for producers of documents who don't want to deal with escaping. It's not meant to have syntactic significance.

Then you just do your XSLT on the string you extracted and produce a new string as the result of the transformation. You put that string back using the trivial operation of setting a text node. (By the way I assume you were using DOM for this, since you said you weren't using XSLT, but the same would apply if you were using a SAX filter process.)

If you're using an identity transformation to convert your DOM to a document and you're stuck with the incorrect requirement of producing a CDATA section for that element, then I believe the <xsl:output> element has an attribute which allows you to specify which elements should be output as CDATA. So instead use a transformation which is the identity transformation with an <xsl:output> element like that.
Norman Meister
Ranch Hand

Joined: Jul 03, 2009
Posts: 48
i did it finally with plain xslt based solution.

thanks anyway.
I agree. Here's the link:
subject: Extraction of CData Part and later its embedding
It's not a secret anymore!