Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Rob Spoor
  • Devaka Cooray
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Jj Roberts
  • Al Hobbs
  • Piet Souris

Confusion Zero Length Match

 
Greenhorn
Posts: 8
IntelliJ IDE Tomcat Server Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I was reading greedy quantifier in K&B. I came across zero-length match which are pretty tricky. Here is a slight modification
to the exam and that's the input given from the console



From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated



Can someone explain this behavior. Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?

Thanks,
 
Greenhorn
Posts: 19
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Been Zaidi wrote:From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated


No, the ? only binds to the second a, so the pattern that regex is looking for is one 'a' followed by zero or one 'a'
To look for one or zero matches of aa, the regex would be "(aa)?".

Been Zaidi wrote:Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?


Imagine you have the regex ".*c" and are looking for a match in "abc". Since the * is greedy, the initial match for the beginning of the regex (".*") would be as long as possible, therefore matching the whole string. But then the end of the regex wouldn't fit anymore, so the pattern matching has to backtrack, making .* only match ab instead of abc.
 
Been Zaidi
Greenhorn
Posts: 8
IntelliJ IDE Tomcat Server Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Norbert Muench wrote:

Been Zaidi wrote:From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated


No, the ? only binds to the second a, so the pattern that regex is looking for is one 'a' followed by zero or one 'a'
To look for one or zero matches of aa, the regex would be "(aa)?".

Been Zaidi wrote:Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?


Imagine you have the regex ".*c" and are looking for a match in "abc". Since the * is greedy, the initial match for the beginning of the regex (".*") would be as long as possible, therefore matching the whole string. But then the end of the regex wouldn't fit anymore, so the pattern matching has to backtrack, making .* only match ab instead of abc.



Dear Nobert,

I am able to understand the first part of your explanation. It was very helpful. Thanks a lot.

The quoted part, i am a little confused. Can you elaborate it a little simply. Looking forward to hear.

Thanks,
 
Ranch Hand
Posts: 451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Example:
.*c means a string with zero or more meta character followed by c.

How about this aaac? Yes. aaa is matched by .*

How about this c? Yes, zero character followed by c.
 
Ranch Hand
Posts: 583
Firefox Browser Notepad Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Helen Ma wrote:Example:
.*c means a string with zero or more meta character followed by c.

How about this aaac? Yes. aaa is matched by .*

How about this c? Yes, zero character followed by c.



I've doubt here, c is already consumed so how it is being count again ??
 
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Been Zaidi wrote:
The quoted part, i am a little confused. Can you elaborate it a little simply. Looking forward to hear.



We can look at -



Regards,
Dan

 
Dan Drillich
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Btw, this greedy regex can be converted to a Reluctant one as Greedy, Reluctant, and Possessive shows.

Regards,
Dan
 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Guyz,
The matcher will search for the atleast one "a". These quantifiers will apply for the exactly one character that means it will find one 'a' and then 0 or 1 a. Then it will find the first 'a' which is based on the criteria one 'a' then 0 or 1 'a'. Then it will find the second match with the 'aa'.

Thanks and Regards
Raju Salla
reply
    Bookmark Topic Watch Topic
  • New Topic