aspose file tools*
The moose likes Java in General and the fly likes Recursive Regular Expression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Recursive Regular Expression" Watch "Recursive Regular Expression" New topic
Author

Recursive Regular Expression

Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I'm looking to parse expressions like this:

${ one ${ two ${ three } } }

I need to match first on the whole thing. Then I'll take the contents of the outside brackets and parse again. Matching on the innermost brackets first will not do the job for me. Any thoughts? Thanks!!


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
I looked this up in Mastering Regular Expressions (p. 328, "Using a Dynamic Regex to Match Nested Pairs"). If you were using Perl, there's a solution in there. The book covers Java, but I'm not familiar with the native Java regex utilities yet.

As the section title implies, it uses a dynamic pattern, in this case a recursive pattern that includes itself. It works by having the included (dynamic) part of the pattern being created by arbitrary Perl code.

While not immediately helpful, you might find more information doing a Google search for dynamic patterns.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Thanks, I'll take a look. In googling for recursive I found a special operator R in some compilers. I probably went in a misleading direction with the title of this ... I need to match the outside expression with regex, then handle the inside with code.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Regular Expressions are greedy by default and work from left to right. So using the simple expression

"$\\{(.*)\\}"

should match the whole string and give you

" one ${ two ${ three } } "

as group 1.

Does that help???


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Ilja: that may work, but it will fail parsing something like

${one} ${two}

assuming the goal here is to grab "${one}" and "${two}" separately. This may or may not be an issue, depending on Stan's input...

Stan, can you put an upper limit on the nesting depth? The technique described in MRE will work in Java for any particular finite nesting depth. The cost is, the greater the depth, the more complex the regex. (Though it's not too difficult to generate - but it's real ugly to look at.) However java.util.regex does not have the dynamic regex construct from Perl (e.g. (??[expression]) ) which seems necessary to make the regex truly recursive. I'm not aware of any elegant Java-based regex to handle this. In Java it's probably best to write some non-regex code that simply iterates through each char in a String, counting occurances of "${" and "}", and saving a new capture group any time the latter count equals the former.


"I'm not back." - Bill Harding, Twister
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I finally got the right Google search for this. The key is "balanced" for parens and such in programming languages. The general conclusion is that it's not possible, and can't be solved with a new bit of syntax because of the whole nature of the regex engine architecture. Whew!

What I'm doing is parsing Wiki markup with regular expressions and building a tree of document elements. I might have to process these nested macros with text substitution before building the dom. A typical one looks like:

${linksto ${page name}}

I can regex find the inner ${page name} pretty easily, replace ${page name} with Home (for example), then find the outer ${linksto Home}. If I do this before parsing all the other Wiki markup then the text produced by the macro will be parsed with no further effort. Guess that will work.
[ October 01, 2004: Message edited by: Stan James ]
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Stan James:
The general conclusion is that it's not possible, and can't be solved with a new bit of syntax because of the whole nature of the regex engine architecture.


Or with other words: the language you want to parse isn't regular: http://en.wikipedia.org/wiki/Regular_grammar

It *is* context-free, though: http://en.wikipedia.org/wiki/Context-free_grammar
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Recursive Regular Expression
 
Similar Threads
subscripting in java
infix to postfix, need help catching errors such as having extra parenthesis
WA #1.....word association
Operators and Precedence
Javac: File Not Found