• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regex for adding CDATA to XML nodes

 
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello folks,

Say I have an XML like this:

<Node>Foo</Node><Node>Bar</Node>

I need to include CDATA prefix/sufix for every <Node/> element. The desired output would be:

<Node><![CDATA[Foo]]></Node><Node><![CDATA[Bar]]></Node>

Do you think it's possible to do this using regular expression without spliting the nodes and applying the transformation for each substring?

Thanks in advance,
Tiago
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It should be possible using String.replaceAll. Is that what you were looking for?
 
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It would be tricky to try and do this with regular expression backreferences to replace the entire <Node>.*</Node> string with <Node><![CDATA[.*]]></Node>. You can get around this by doing it in two passes. First replace the opening tag, then the closing tag.

 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why would that be tricky? I haven't tried, but shouldn't something like the following work?

str.replaceAll("<Node>(.*?)</Node>", "<Node><![CDATA[\\1]]></Node>");
 
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Be careful of greedy matching (the default matching mode) in the previous example: otherwise the .* will match everything between the very first <Node> and the very last </Node> in the document!
 
Bill Cruise
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's the first thing I tried too, but the backreference doesn't work with String's replaceAll method because the enclosing parentheses are in the regex and the backreference to the group are in the replacement String.
 
Charles Lyons
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You need to use dollar $ signs for captured sub-sequences... See: Matcher.replaceAll(String)
[ July 03, 2008: Message edited by: Charles Lyons ]
 
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Regex problems aside, you would also have to unescape things like ampersands and less-than symbols. For example you would want to convert to
[ July 03, 2008: Message edited by: Paul Clapham ]
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Charles Lyons:
You need to use dollar $ signs for captured sub-sequences... See: Matcher.replaceAll(String)

[ July 03, 2008: Message edited by: Charles Lyons ]



Good point!
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Charles Lyons:
Be careful of greedy matching (the default matching mode) in the previous example: otherwise the .* will match everything between the very first <Node> and the very last </Node> in the document!



If you are talking about my example, those are exactly *not* greedy.
 
Charles Lyons
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

If you are talking about my example, those are exactly *not* greedy.

Nope - I did notice the extra "?" in your example. It's just something very easy to overlook, generally not well understood and I thought a good idea to point out!
 
Master Rancher
Posts: 4830
74
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Another complication: can node elements be nested inside other Nodes? That could get ugly:



That might become:



The above is almost certainly not what you want here, but I don't know what you do want. With luck, nesting never occurs, and this will be easier.
 
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham:
... you would also have to unescape things like ampersands and less-than symbols.


While looking out for ]]&gt;, which can't be part of a CDATA section at all.
[ July 05, 2008: Message edited by: Carey Evans ]
 
Tiago Fernandez
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks everyone! I've taken the following solution for the problem I was having anyways:



Cheers!
reply
    Bookmark Topic Watch Topic
  • New Topic