• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Need help in understanding greedy, reluctant quantifiers in regular expressions??

 
Ranch Hand
Posts: 924
1
Netbeans IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'am reading regular expression from the java tutorials given on Sun's site. Please refer the link http://docs.oracle.com/javase/tutorial/essential/regex/quant.html. Regarding greedy quantifiers it says that

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

What does here the line "backs off the input string by one character and tries again actually means? can somebody please explain with a simple example.

Same for reluctant(although they are not on SCJP 6). Again referring to the link above it says that "The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. " Here also what does "eat one character at a time looking for a match means ".
 
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It just means that with greedy quantifiers the largest string will be matched that fits the pattern. With reluctant quantifiers the shortest string will be matched.

For example, for the input string "blablabla"; the pattern "(bla)+" will match the entire input, while the pattern "(bla)+?" will only match the first "bla" of the input string.
 
gurpeet singh
Ranch Hand
Posts: 924
1
Netbeans IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

gurpeet singh wrote:thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?




Here is another example (using the find() method)...

String: ababababbabababXXXabababababbaba
Regex: .*XXX

In this example, the first part of the regex (".*") will match any character in a greedy way. This means that it will match the whole string. However, if it does that, then the whole regex (".*XXX") will fail. So, the first part of the regex will backoff, and match one less character. The whole regex will still fail. The first part of the regex will keep backing off, one character at a time, until it matches only "ababababbababab" -- which means that the whole regex will match "ababababbabababXXX" and succeed.

Now, of course, this is all happening within the regex engine. You don't see these match fails and backoffs. The find() method will only returns the matches -- and failure when it exhausted all possible matches.

Henry
 
gurpeet singh
Ranch Hand
Posts: 924
1
Netbeans IDE Fedora Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks henry . and in the future i would take care of not creating multiple threads for same issues. thanks for correcting me.
 
reply
    Bookmark Topic Watch Topic
  • New Topic