aspose file tools*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Doubt regarding Regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Doubt regarding Regex "?" quantifier" Watch "Doubt regarding Regex "?" quantifier" New topic
Author

Doubt regarding Regex "?" quantifier

Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

Hi

I tried out the following piece of code given in SCJP for Java 6:



The output is :

0 0
1 2 a
2 2
3 4 a
4 4


I have a small doubt regarding the "?" quantifier which is described as "fetches zero or one instance of the compiled pattern being supplied to the matcher source".

a) Why am I getting the above output? What does "a?" pattern mean? Of course it means find zero or at least 1 occurrence of "a" in source string. So why is matcher.end() returning next index position from found "a" instance? When it finds "a" at position 1, it prints matcher.end() as 2. Why?

Secondly, why is 4 4 there in the output? The index positions are only till 3 right? ("baba") Then why does it print the one that is not there in the source string anyway? Please help me understand what is happening here.


~ Mansukh
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36590
    
  16
Write down all the places in that text where there is an a. Now write down all the places where there are 0 a-s.
Remember that is an exam question, so it is designed to confuse you.
Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

Campbell Ritchie wrote:Write down all the places in that text where there is an a. Now write down all the places where there are 0 a-s.
Remember that is an exam question, so it is designed to confuse you.


Ok. So I understood all the returned output except 4 4. The index position 4 isn't even part of the source string "baba". It has only 3 zero based index positions. Then why does it return 4 4?
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36590
    
  16
The a at position 3 is followed by no a-s.
Mansukhdeep Thind
Ranch Hand

Joined: Jul 27, 2010
Posts: 1157

Campbell Ritchie wrote:The a at position 3 is followed by no a-s.


We can say on similar lines that the first "b" is preceded by 0 a-s.
Himai Minh
Ranch Hand

Joined: Jul 29, 2012
Posts: 610
Mansukhdeep Thind wrote:Hi



The output is :

0 0
1 2 a
2 2
3 4 a
4 4


Secondly, why is 4 4 there in the output? The index positions are only till 3 right? ("baba") Then why does it print the one that is not there in the source string anyway? Please help me understand what is happening here.


It is not intuitive. a? may match with a zero length string. So, it always returns the very last zero length substring, which is an empty string at the very end of the input string.
This is a tricky part of the exam. Take a look at one of the "Exam Watch" sessions in KB's book.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Doubt regarding Regex "?" quantifier
 
Similar Threads
Doubt about Regex and Quantifiers
Regular Expressions and greedy quantifier
Regex Question - Quantifiers' behavior.
Pattern matching doubt
doubt on group() in Matcher class