File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Confusion Zero Length Match

 
Been Zaidi
Greenhorn
Posts: 8
Chrome IntelliJ IDE Tomcat Server
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I was reading greedy quantifier in K&B. I came across zero-length match which are pretty tricky. Here is a slight modification
to the exam and that's the input given from the console



From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated



Can someone explain this behavior. Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?

Thanks,
 
Norbert Muench
Greenhorn
Posts: 19
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Been Zaidi wrote:From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated

No, the ? only binds to the second a, so the pattern that regex is looking for is one 'a' followed by zero or one 'a'
To look for one or zero matches of aa, the regex would be "(aa)?".

Been Zaidi wrote:Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?

Imagine you have the regex ".*c" and are looking for a match in "abc". Since the * is greedy, the initial match for the beginning of the regex (".*") would be as long as possible, therefore matching the whole string. But then the end of the regex wouldn't fit anymore, so the pattern matching has to backtrack, making .* only match ab instead of abc.
 
Been Zaidi
Greenhorn
Posts: 8
Chrome IntelliJ IDE Tomcat Server
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Norbert Muench wrote:
Been Zaidi wrote:From the console here is the input given


In my opinion, the instruction is asking to find one or zero match of aa. But here is the output generated

No, the ? only binds to the second a, so the pattern that regex is looking for is one 'a' followed by zero or one 'a'
To look for one or zero matches of aa, the regex would be "(aa)?".

Been Zaidi wrote:Another additional thing, inside the book it is written that the engine backtracks as well. Why is that so?

Imagine you have the regex ".*c" and are looking for a match in "abc". Since the * is greedy, the initial match for the beginning of the regex (".*") would be as long as possible, therefore matching the whole string. But then the end of the regex wouldn't fit anymore, so the pattern matching has to backtrack, making .* only match ab instead of abc.


Dear Nobert,

I am able to understand the first part of your explanation. It was very helpful. Thanks a lot.

The quoted part, i am a little confused. Can you elaborate it a little simply. Looking forward to hear.

Thanks,
 
Helen Ma
Ranch Hand
Posts: 451
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Example:
.*c means a string with zero or more meta character followed by c.

How about this aaac? Yes. aaa is matched by .*

How about this c? Yes, zero character followed by c.
 
saloni jhanwar
Ranch Hand
Posts: 583
Firefox Browser Notepad Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Helen Ma wrote:Example:
.*c means a string with zero or more meta character followed by c.

How about this aaac? Yes. aaa is matched by .*

How about this c? Yes, zero character followed by c.


I've doubt here, c is already consumed so how it is being count again ??
 
Dan Drillich
Ranch Hand
Posts: 1183
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Been Zaidi wrote:
The quoted part, i am a little confused. Can you elaborate it a little simply. Looking forward to hear.



We can look at -



Regards,
Dan

 
Dan Drillich
Ranch Hand
Posts: 1183
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Btw, this greedy regex can be converted to a Reluctant one as Greedy, Reluctant, and Possessive shows.

Regards,
Dan
 
raju salla
Greenhorn
Posts: 18
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Guyz,
The matcher will search for the atleast one "a". These quantifiers will apply for the exactly one character that means it will find one 'a' and then 0 or 1 a. Then it will find the first 'a' which is based on the criteria one 'a' then 0 or 1 'a'. Then it will find the second match with the 'aa'.

Thanks and Regards
Raju Salla
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic