I need to get the text content in the CDATA section removing all the tags i.e., the output should be
I tried the following script.
However this script converts all the angel brackets in the CDTA section to corresponding entities and copies them to the output along with the tag names. The output from the above script looks like the following.
Is there anyway I can obtain only text content and remove all tags from the input?
Why? Using CDATA specifically says "This is not XML markup, this is just text. Do not treat it as markup." But you're saying you want to treat it as markup anyway?
Well, okay, you can do that. But you can't assume that it's going to be well-formed XML markup. It might have some < and > characters but you can't assume they will appear in pairs. But you could write some code that copied the text into a new location, but stopped copying it when you hit < and started again after you hit >.
But I'm still asking why you want to do that. I suspect there's some misunderstanding going on.
Joined: Sep 01, 2005
Thanks for replying. You are absolutely right in stating that this data should not have been enclosed in CDATA sections. However the XML data is written by a different application and it is tough asking them to change their code. We however do have a guarantee that the contents are going to be well-formed markup.
But you could write some code that copied the text into a new location, but stopped copying it when you hit < and started again after you hit >.
This is exactly what I need to do. I have no idea on how to accomplish this in XSLT. Is there an XSLT function which allows me to do search and replace by specifying regular expressions.
You want to do this in XSLT? And all you want to do is remove the tags? Then you can write a template to do that. Here's pseudo-XSLT for the template, which would have a parameter containing the string to be de-tagged:It's a recursive template, this is quite a common technique in declarative languages like XSLT.
Joined: Sep 01, 2005
Thanks a ton for the suggestion. It solved my problem. :-)