wood burning stoves*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Need help in understanding greedy, reluctant quantifiers in regular expressions?? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Need help in understanding greedy, reluctant quantifiers in regular expressions??" Watch "Need help in understanding greedy, reluctant quantifiers in regular expressions??" New topic
Author

Need help in understanding greedy, reluctant quantifiers in regular expressions??

gurpeet singh
Ranch Hand

Joined: Apr 04, 2012
Posts: 924
    
    1

I'am reading regular expression from the java tutorials given on Sun's site. Please refer the link http://docs.oracle.com/javase/tutorial/essential/regex/quant.html. Regarding greedy quantifiers it says that

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

What does here the line "backs off the input string by one character and tries again actually means? can somebody please explain with a simple example.

Same for reluctant(although they are not on SCJP 6). Again referring to the link above it says that "The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. " Here also what does "eat one character at a time looking for a match means ".
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

It just means that with greedy quantifiers the largest string will be matched that fits the pattern. With reluctant quantifiers the shortest string will be matched.

For example, for the input string "blablabla"; the pattern "(bla)+" will match the entire input, while the pattern "(bla)+?" will only match the first "bla" of the input string.
gurpeet singh
Ranch Hand

Joined: Apr 04, 2012
Posts: 924
    
    1

thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

gurpeet singh wrote:thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?



Here is another example (using the find() method)...

String: ababababbabababXXXabababababbaba
Regex: .*XXX

In this example, the first part of the regex (".*") will match any character in a greedy way. This means that it will match the whole string. However, if it does that, then the whole regex (".*XXX") will fail. So, the first part of the regex will backoff, and match one less character. The whole regex will still fail. The first part of the regex will keep backing off, one character at a time, until it matches only "ababababbababab" -- which means that the whole regex will match "ababababbabababXXX" and succeed.

Now, of course, this is all happening within the regex engine. You don't see these match fails and backoffs. The find() method will only returns the matches -- and failure when it exhausted all possible matches.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
gurpeet singh
Ranch Hand

Joined: Apr 04, 2012
Posts: 924
    
    1

thanks henry . and in the future i would take care of not creating multiple threads for same issues. thanks for correcting me.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Need help in understanding greedy, reluctant quantifiers in regular expressions??