aspose file tools*
The moose likes XML and Related Technologies and the fly likes removing all element names from CDATA section Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "removing all element names from CDATA section" Watch "removing all element names from CDATA section" New topic
Author

removing all element names from CDATA section

Vijay Chouhan
Ranch Hand

Joined: Sep 01, 2005
Posts: 30
Hi Ranchers,

I have the following xml element.



I need to get the text content in the CDATA section removing all the tags i.e., the output should be



I tried the following script.



However this script converts all the angel brackets in the CDTA section to corresponding entities and copies them to the output along with the tag names. The output from the above script looks like the following.



Is there anyway I can obtain only text content and remove all tags from the input?

I hope my question makes sense.

Thanks,
Vijay
[ January 17, 2007: Message edited by: Vijay Chouhan ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

Why? Using CDATA specifically says "This is not XML markup, this is just text. Do not treat it as markup." But you're saying you want to treat it as markup anyway?

Well, okay, you can do that. But you can't assume that it's going to be well-formed XML markup. It might have some < and > characters but you can't assume they will appear in pairs. But you could write some code that copied the text into a new location, but stopped copying it when you hit < and started again after you hit >.

But I'm still asking why you want to do that. I suspect there's some misunderstanding going on.
Vijay Chouhan
Ranch Hand

Joined: Sep 01, 2005
Posts: 30
Thanks for replying. You are absolutely right in stating that this data should not have been enclosed in CDATA sections. However the XML data is written by a different application and it is tough asking them to change their code. We however do have a guarantee that the contents are going to be well-formed markup.

But you could write some code that copied the text into a new location, but stopped copying it when you hit < and started again after you hit >.


This is exactly what I need to do. I have no idea on how to accomplish this in XSLT. Is there an XSLT function which allows me to do search and replace by specifying regular expressions.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

You want to do this in XSLT? And all you want to do is remove the tags? Then you can write a template to do that. Here's pseudo-XSLT for the template, which would have a parameter containing the string to be de-tagged:It's a recursive template, this is quite a common technique in declarative languages like XSLT.
Vijay Chouhan
Ranch Hand

Joined: Sep 01, 2005
Posts: 30
Thanks a ton for the suggestion. It solved my problem. :-)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: removing all element names from CDATA section