Forum:

Java in General

Regex for adding CDATA to XML nodes

Ranch Hand

Posts: 167

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hello folks,

Say I have an XML like this:

<Node>Foo</Node><Node>Bar</Node>

I need to include CDATA prefix/sufix for every <Node/> element. The desired output would be:

<Node><![CDATA[Foo]]></Node><Node><![CDATA[Bar]]></Node>

Do you think it's possible to do this using regular expression without spliting the nodes and applying the transformation for each substring?

Thanks in advance,
Tiago

Tiago Fernandez
http://www.tiago182.spyw.com/

Ilja Preuss

author

Posts: 14112

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It should be possible using String.replaceAll. Is that what you were looking for?

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

Bill Cruise

Ranch Hand

Posts: 148

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It would be tricky to try and do this with regular expression backreferences to replace the entire <Node>.*</Node> string with <Node><![CDATA[.*]]></Node>. You can get around this by doing it in two passes. First replace the opening tag, then the closing tag.

Ilja Preuss

author

Posts: 14112

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Why would that be tricky? I haven't tried, but shouldn't something like the following work?

str.replaceAll("<Node>(.*?)</Node>", "<Node><![CDATA[\\1]]></Node>");

Charles Lyons

Author

Posts: 836

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Be careful of greedy matching (the default matching mode) in the previous example: otherwise the .* will match everything between the very first <Node> and the very last </Node> in the document!

Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )

Bill Cruise

Ranch Hand

Posts: 148

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

That's the first thing I tried too, but the backreference doesn't work with String's replaceAll method because the enclosing parentheses are in the regex and the backreference to the group are in the replacement String.

Charles Lyons

Author

Posts: 836

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

You need to use dollar $ signs for captured sub-sequences... See: Matcher.replaceAll(String)
[ July 03, 2008: Message edited by: Charles Lyons ]

Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )

Paul Clapham

Marshal

Posts: 28226

I like...

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Regex problems aside, you would also have to unescape things like ampersands and less-than symbols. For example you would want to convert to
[ July 03, 2008: Message edited by: Paul Clapham ]

Java 8 (verified skill)
Skill verified by Paul Clapham

Ilja Preuss

author

Posts: 14112

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Charles Lyons:
You need to use dollar $ signs for captured sub-sequences... See: Matcher.replaceAll(String)

[ July 03, 2008: Message edited by: Charles Lyons ]

Good point!

Ilja Preuss

author

Posts: 14112

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Charles Lyons:
Be careful of greedy matching (the default matching mode) in the previous example: otherwise the .* will match everything between the very first <Node> and the very last </Node> in the document!

If you are talking about my example, those are exactly *not* greedy.

Charles Lyons

Author

Posts: 836

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

If you are talking about my example, those are exactly *not* greedy.

Nope - I did notice the extra "?" in your example. It's just something very easy to overlook, generally not well understood and I thought a good idea to point out!

Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )

Mike Simmons

Master Rancher

Posts: 4830

posted 15 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Another complication: can node elements be nested inside other Nodes? That could get ugly:

That might become:

The above is almost certainly not what you want here, but I don't know what you do want. With luck, nesting never occurs, and this will be easier.

Carey Evans

Ranch Hand

Posts: 225

I like...