my dog learned polymorphism
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes regex confusion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "regex confusion" Watch "regex confusion" New topic
Author

regex confusion

David G Harris
Greenhorn

Joined: Sep 23, 2009
Posts: 6
In Chapter 6 of the SCJP book, Self Test problem 1 says:

Given:


And the command line:

java Regex2 "\d*" ab34ef

What is the result?

A.234

B.334

C.2334

D.0123456

E.01234456

F.12334567

G.Compilation fails


The answer is E and I understand up to the point of there being a '6' at the end. of the output. I was thinking in terms of indexes so I don't understand why the output doesn't end at 5.
Ulf Dittmer
Rancher

Joined: Mar 22, 2005
Posts: 42958
    
  73
That's a FAQ: http://www.coderanch.com/how-to/java/SCJP-FAQ#kb-regexp
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
hi,
I have a question about this code and found this post through search so i thought of asking here.
the m.find() method returns true if it gets a match, so how is it possible that code gets executed inside the while loop before 34?

thank you.
Ankit Garg
Sheriff

Joined: Aug 03, 2008
Posts: 9405
    
  20

That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...


SCJP 6 | SCWCD 5 | Javaranch SCJP FAQ | SCWCD Links
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
Ankit Garg wrote:That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...


oh yea! i never thought of that! i thought the first match has to be a digit followed by zero or more digits. Thanks.
It has been a very busy week here at work, i couldn't concentrate at all while studying, i felt like i'm hitting
Scotty Mitchell
Ranch Hand

Joined: Aug 09, 2011
Posts: 46
I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
Scotty Mitchell wrote:I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!


thank you
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: regex confusion
 
It's not a secret anymore!