• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Greedy quantifier

 
Ranch Hand
Posts: 65
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi ,
I am reading chapter6 of K&B 1.5 and I am not very clear about Greedy quantifiers.
Its says that "+", "*", and "?" are greedy quantifiers. And later it says for greedy quantifier search is from Right to Left.
So if I am right then for the code below there should be only one result and the result should start from position 8 and The matches should be at position: "8","7","6","4","3".
which is not true.
I don;t know what am I missing here . I mean is "+" not a greedy quantifier or Greedy quantifier doesn;t start from Right to Left or something else?


Can anyone please explain this or provide some good link about this?
 
Ranch Hand
Posts: 1032
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Jeena,

What the authors mean when they say that greedy quantifiers match from right to left is not that the string is parsed from right to left, but that a greedy quantifier will match as much as possible, which means that if you have a regex "\\d+" and you have a string "a12345bcd", the regex engine will look at 1 and continue with 2, 3, 4, and 5 (so you go as far to the right as possible.) Then it sees b and that's not a match, so the match in this case will be "12345." In contrast, reluctant quantifiers won't continue looking towards the right once a match has been found, because they match as little as possible.
 
Sheriff
Posts: 9707
43
Android Google Web Toolkit Hibernate IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Jeena, AFAIK regex never starts matching from the right. The statement is just there to make you understand the concept. As Ruben said, regex engine keeps searching the search criteria even if a match which satisfies the expression is found. They try to fir as much of text into the match as it is possible...
 
Ruben Soto
Ranch Hand
Posts: 1032
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And, since we are talking about greedy and reluctant quantifiers:

?? will only match the empty string
*? will only match the empty string
+? will only match one instance of the item its quantifying
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ruben Soto wrote:And, since we are talking about greedy and reluctant quantifiers:

?? will only match the empty string
*? will only match the empty string
+? will only match one instance of the item its quantifying



Example please. What do you mean that these qualifiers "will only match the empty string"? Whether they match an empty string is also dependent on the whole pattern, not just the qualifier.

Henry
 
Ruben Soto
Ranch Hand
Posts: 1032
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

Ruben Soto wrote:And, since we are talking about greedy and reluctant quantifiers:

?? will only match the empty string
*? will only match the empty string
+? will only match one instance of the item its quantifying



Example please. What do you mean that these qualifiers "will only match the empty string"? Whether they match an empty string is also dependent on the whole pattern, not just the qualifier.

Henry


I should have been a little more specific.

I think that "(anything)??" will only match the empty string. The same for "(anything)*?", ".??", ".*?", "1??", "a*?", etc. Since the lower boundary of ? and * is 0 occurrences, making them reluctant makes them only match zero instances (the empty string.)

If used as a part of a bigger pattern, they won't affect the result because of this reason.

For example, the pattern "1234.?(anything)??abc(anything)*?" is equivalent to the pattern "1234.?abc"

Something similar happens with +?, except since the lower boundary of + is 1, this will always cause it to only match one of whatever it is modifying.

This is my understanding although I might be wrong.
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I think that "(anything)??" will only match the empty string. The same for "(anything)*?", ".??", ".*?", "1??", "a*?", etc. Since the lower boundary of ? and * is 0 occurrences, making them reluctant makes them only match zero instances (the empty string.)

If used as a part of a bigger pattern, they won't affect the result because of this reason.

For example, the pattern "1234.?(anything)??abc(anything)*?" is equivalent to the pattern "1234.?abc"



Sorry, but this is not true. The reluctant qualifier will match as little as possible, while still allowing the overall pattern to match. If matching zero items causes the overall pattern to not match, it will match more... for example...

Let's say the pattern is ".*?AAA". This pattern matches zero or more of any characters, reluctantly, followed by "AAA'. Now, let's assume the string is "abcdefAAAhijklAAA". And you call the find() method...

It will try to match as little as possible, but if it matches zero characters, the bigger pattern will not match. The smallest amount that the ".*?" portion of the pattern can match is "abcdef" -- any less and the ".*?AAA" pattern won't match via the find() method.

Henry
 
Ruben Soto
Ranch Hand
Posts: 1032
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, you are correct. I see what you mean by "match as little as possible while allowing the overall pattern to match."

What I find confusing, is this type of result:

source: ab12345cd
pattern: \\d*? // Same results if pattern is \\d??
matches: (Empty string) at 0
(Empty string) at 1 // Why not match 12345 here?
(Empty string) at 2
(Empty string) at 3
(Empty string) at 4
(Empty string) at 5
(Empty string) at 6
(Empty string) at 7
(Empty string) at 8
(Empty string) at 9

I think that the strategy is "match as little as possible, while allowing the overall pattern to match starting at the earliest index of the source string." That is not as catchy, but seems to more accurately depict how the regex engine handles this.

Thanks,

Ruben
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I think that the strategy is "match as little as possible, while allowing the overall pattern to match starting at the earliest index of the source string." That is not as catchy, but seems to more accurately depict how the regex engine handles this.



Not really. There are two things going on -- and they are only somewhat related.

First, the regex. In this case, the whole pattern matches zero characters, so it will always match zero characters. There is no bigger pattern that will force it to match more -- so it will always match zero length in this example.

Second, the find() method. The method will start from beginning of the string looking for a match. If one is found, it will start the next match at the end of the one that is found. Unless, of course, the match is zero length, then it will go to the next character to look for the next match. And finally, if it doesn't find a match at the current location, it will move to the next location.

The behavior of the find() method is the same, regardless of the regex pattern. So, it is better to keep those two things in mind, and try to keep them separated.

Henry
 
Ruben Soto
Ranch Hand
Posts: 1032
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

I think that the strategy is "match as little as possible, while allowing the overall pattern to match starting at the earliest index of the source string." That is not as catchy, but seems to more accurately depict how the regex engine handles this.



Not really. There are two things going on -- and they are only somewhat related.

First, the regex. In this case, the whole pattern matches zero characters, so it will always match zero characters. There is no bigger pattern that will force it to match more -- so it will always match zero length in this example.

Second, the find() method. The method will start from beginning of the string looking for a match. If one is found, it will start the next match at the end of the one that is found. Unless, of course, the match is zero length, then it will go to the next character to look for the next match. And finally, if it doesn't find a match at the current location, it will move to the next location.

The behavior of the find() method is the same, regardless of the regex pattern. So, it is better to keep those two things in mind, and try to keep them separated.

Henry


I understand what you mean by keeping what the regex considers a match and the behavior of the find() method separate, Henry. I was testing this using a program which does use the Matcher.find() method, and maybe I was extracting conclusions without seeing through the behavior of the find() method. Thanks for explaining!
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic