This week's book giveaway is in the OCPJP forum.
We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line!
See this thread for details.
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Matcher.find() -> looks one past end of String? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Matcher.find() -> looks one past end of String?" Watch "Matcher.find() -> looks one past end of String?" New topic
Author

Matcher.find() -> looks one past end of String?

Richard Parker
Ranch Hand

Joined: Jan 23, 2007
Posts: 70
Hello,

I have a question about the Chapter 6 Self Test question #1 in the K&B book.
Here is the question from the book:

-------------------------------------------
Given:

import java.util.regex.*;
class Regex2 {
public static void main(String[] args)
{
String pattern = "\\d*";
String source = "ab34ef";

Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);

boolean b = false;
while( b = m.find() )
{
System.out.println( m.start() + m.group() );
}
}
}

And the commend line:

java Regex2 "\d*" ab34ef

-------------------------------------------
Correct Answer: 01234456
-------------------------------------------

My question is: does Matcher.find() always look one position past the end of the source String? In this example I would have thought the answer to be:

0123445

because there are no more characters after position 5.

-
Any thoughts on this will be greatly appreciated.
Thanks in advance,

Richard


"...it takes all the running you can do to keep in the same place. <br />If you want to get somewhere else, you must run at least twice as fast as that!" <br />~ Through the Looking-Glass
Jesse Custer
Ranch Hand

Joined: Feb 07, 2007
Posts: 45
SCJP FAQ Page
Richard Parker
Ranch Hand

Joined: Jan 23, 2007
Posts: 70
Sweet!
Thanks for the link (I'll probably be referring to this often.)

So:
"The asterisk (*) is a "greedy quantifier," specifying that whatever preceeds it (in this case, any digit) should be matched zero or more times. By allowing for zero occurrances, a match of zero length is possible. Because a match of zero length is possible, the find() method will check the index following the last character of input."

A match of zero length for greedy quantifiers seems weird.
Something to definitely keep in mind.

Thank you!
Jesse Custer
Ranch Hand

Joined: Feb 07, 2007
Posts: 45
Your welcome,

by the way, regex is something that took me quite some time to understand. It doesn't always behave the way you think it will.

For example if you take the same code you gave but change the pattern to "\\d*?". What do you think the result will be?
Javier Sanchez Cerrillo
Ranch Hand

Joined: Aug 02, 2006
Posts: 152
For example if you take the same code you gave but change the pattern to "\\d*?". What do you think the result will be?


Reluctant quantifiers are not covered in the exam. Neither Possessive Quantifiers are covered.


SCJP 5.0 95%<br /> <br />The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.
Bijendra S. Rajput
Ranch Hand

Joined: Sep 19, 2006
Posts: 41
Hi Jesse,

I am really surprized why seeing the output of this program with

java Regex2 "\d*?" ab34df

m.group() is not printing anything.......confused......

can you help me please...


Thanks <br /> <br />Regards,<br />------------------------------<br />Bijendra S. Rajput<br />SCJP 1.5<br />------------------------------
Bijendra S. Rajput
Ranch Hand

Joined: Sep 19, 2006
Posts: 41
sorry I forgot to write the o/p

0123456
Jesse Custer
Ranch Hand

Joined: Feb 07, 2007
Posts: 45
I just checked it, and Javier is right. Only greedy quantifiers are on the exam. But reluctant quantifiers are in the K&B book so it's not totally irrelevant.

What you probably expected was that m.group() would print '3' at position 2 and '4' at postition 3 wich would give the output: 012334456

The expression "\\d*?" is searching for 0, 1 or more occurences of digits, and since it's a reluctant quantifier it will give back AS LITTLE AS POSSIBLE. At position 2 it comes across '3' wich is a digit. Now instead of returning this digit, it in fact returns 0 digits because that is the smallest value it can return while still following the expression.
So the output is indeed: 0123456

I hope this explanation makes it clearer, because I find it hard to explain.
Try playing with the next piece of code if it's still unclear to you about what the regex returns.


Greetings
 
jQuery in Action, 2nd edition
 
subject: Matcher.find() -> looks one past end of String?