Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Need help in understanding greedy, reluctant quantifiers in regular expressions??

 
gurpeet singh
Ranch Hand
Posts: 924
1
Fedora Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'am reading regular expression from the java tutorials given on Sun's site. Please refer the link http://docs.oracle.com/javase/tutorial/essential/regex/quant.html. Regarding greedy quantifiers it says that

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from.

What does here the line "backs off the input string by one character and tries again actually means? can somebody please explain with a simple example.

Same for reluctant(although they are not on SCJP 6). Again referring to the link above it says that "The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. " Here also what does "eat one character at a time looking for a match means ".
 
Stephan van Hulst
Bartender
Pie
Posts: 5893
63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It just means that with greedy quantifiers the largest string will be matched that fits the pattern. With reluctant quantifiers the shortest string will be matched.

For example, for the input string "blablabla"; the pattern "(bla)+" will match the entire input, while the pattern "(bla)+?" will only match the first "bla" of the input string.
 
gurpeet singh
Ranch Hand
Posts: 924
1
Fedora Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?
 
Henry Wong
author
Marshal
Pie
Posts: 21194
81
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
gurpeet singh wrote:thanks for the reply Steven.... but the example you gave with regular expression as (bla)+ and input string as blablabla will match in the very first attempt. the tutorial says that "If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from." I would like to know if the first attempt fails , what does "matcher backs off the input string by oe characted and tries again" means exactly ?



Here is another example (using the find() method)...

String: ababababbabababXXXabababababbaba
Regex: .*XXX

In this example, the first part of the regex (".*") will match any character in a greedy way. This means that it will match the whole string. However, if it does that, then the whole regex (".*XXX") will fail. So, the first part of the regex will backoff, and match one less character. The whole regex will still fail. The first part of the regex will keep backing off, one character at a time, until it matches only "ababababbababab" -- which means that the whole regex will match "ababababbabababXXX" and succeed.

Now, of course, this is all happening within the regex engine. You don't see these match fails and backoffs. The find() method will only returns the matches -- and failure when it exhausted all possible matches.

Henry
 
gurpeet singh
Ranch Hand
Posts: 924
1
Fedora Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks henry . and in the future i would take care of not creating multiple threads for same issues. thanks for correcting me.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic