aspose file tools*
The moose likes Java in General and the fly likes Difference Among Greedy, Reluctant, and Possessive Quantifiers Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Difference Among Greedy, Reluctant, and Possessive Quantifiers" Watch "Difference Among Greedy, Reluctant, and Possessive Quantifiers" New topic
Author

Difference Among Greedy, Reluctant, and Possessive Quantifiers

Ninad Kulkarni
Ranch Hand

Joined: Aug 31, 2007
Posts: 787

Sun Java Tutorials wrote:Differences Among Greedy, Reluctant, and Possessive Quantifiers
There are subtle differences among greedy, reluctant, and possessive quantifiers.

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

To illustrate, consider the input string xfooxxxxxxfoo.


Enter your regex: .*foo // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f" "o" "o". Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters ("f" "o" "o") have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated, at which point the match succeeds and the search ends.

The second example, however, is reluctant, so it starts by first consuming "nothing". Because "foo" doesn't appear at the beginning of the string, it's forced to swallow the first letter (an "x"), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+, leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found



I am confused and didn't understand after reading following statements from Java tutorial for greedy & reluctant quantifier
Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.
and
The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.
Please help to understand these statements.

SCJP 5.0 - JavaRanch FAQ - Java Beginners FAQ - SCJP FAQ - SCJP Mock Tests - Tutorial - JavaSE7 - JavaEE6 -Generics FAQ - JLS - JVM Spec - Java FAQs - Smart Questions
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11246
    
  16

remember that computers are really pretty stupid. they are, however, VERY fast.

Similar to the example, let's assume your input string is "xfooxxxxxxfoo", and your regex is ".*foo". Forgetting about greedy, reluctant, etc. this basically says "some number of characters, followed by "foo".

looking at the string, there are two possible ways this could match. "xfoo" and "xfooxxxxxxfoo" both fit the bill, as both have some number of characters before 'foo'.

The greedy quantifier would take the entire string: "xfooxxxxxxfoo" and say "does this match"? in our case it does, so that is what is returned. If instead, we had used the greedy quantifer and ".*fooxxxxxx", it would first look at "xfooxxxxxxfoo". That doesn't match, so it would then look at "xfooxxxxxxfo", and then "xfooxxxxxxf" and then "xfooxxxxxx" which does finally match. Depending on what exactly your regex is, it may work its way down to a single character or even no characters.

The reluctant quantifier would start with "x" and see if it matches, then "xf", then "xfo", then "xfoo" and stops as soon as it finds a match.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Ninad Kulkarni
Ranch Hand

Joined: Aug 31, 2007
Posts: 787

Thanks a lot Fred, Now I understood the difference properly.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Difference Among Greedy, Reluctant, and Possessive Quantifiers