Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Agile forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Remove [[CDATA] and ]] from an xml file

 
JayaSiji Gopal
Ranch Hand
Posts: 303
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi!

I have an xml file with a tag like this:

<main>
<![CDATA[
<div class="hugin">
<div class="hugin">KONCERNCHEFENS KOMMENTARER</div>
<div class="hugin"> </div>
<div class="hugin">"Vi har sett den st�rsta �kningen i antalet mobiltelefonanv�ndare n�gonsin. Under 2004 tillkom 300 miljoner nya anv�ndare och nu har 27 procent av v�rldens befolkning tillg�ng till mobiltelefoni", s�ger Carl-Henric Svanberg, VD och koncernchef f�r Ericsson. "F�r oss som har en vision om en v�rld d�r alla kan kommunicera med varandra - n�r som helst och var som helst - �r detta en sp�nnande utveckling.</div>
]]>
</main>

I want to write to an xml document something like this:

<main>
<div class="hugin">
<div class="hugin">KONCERNCHEFENS KOMMENTARER</div>
<div class="hugin"> </div>
<div class="hugin">"Vi har sett den st�rsta �kningen i antalet mobiltelefonanv�ndare n�gonsin. Under 2004 tillkom 300 miljoner nya anv�ndare och nu har 27 procent av v�rldens befolkning tillg�ng till mobiltelefoni", s�ger Carl-Henric Svanberg, VD och koncernchef f�r Ericsson. "F�r oss som har en vision om en v�rld d�r alla kan kommunicera med varandra - n�r som helst och var som helst - �r detta en sp�nnande utveckling.</div>
</main>

Any ideas?? Please help.
 
Madhav Lakkapragada
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ummm...before I dive into suggesting something here -
There's a reason why CDATA is used (or abused?). Are you really sure you want to do this ? Does this CDATA always have XML-like content?

- m
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13055
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I feel your pain I have a client whose XML documents have huge chunks of CDATA that is also formatted as XML. If you don't need to actually manipulate that content as XML you are lucky.
The simplest thing might be to just tackle it as a text filter problem and never go through XML at all - just line by line take input and write output.
Look for the start of CDATA or the end of CDATA in a line and just cut it out of the line.
Bill
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic