File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes regex confusion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "regex confusion" Watch "regex confusion" New topic
Author

regex confusion

David G Harris
Greenhorn

Joined: Sep 23, 2009
Posts: 6
In Chapter 6 of the SCJP book, Self Test problem 1 says:

Given:


And the command line:

java Regex2 "\d*" ab34ef

What is the result?

A.234

B.334

C.2334

D.0123456

E.01234456

F.12334567

G.Compilation fails


The answer is E and I understand up to the point of there being a '6' at the end. of the output. I was thinking in terms of indexes so I don't understand why the output doesn't end at 5.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42264
    
  64
That's a FAQ: http://www.coderanch.com/how-to/java/SCJP-FAQ#kb-regexp


Ping & DNS - my free Android networking tools app
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
hi,
I have a question about this code and found this post through search so i thought of asking here.
the m.find() method returns true if it gets a match, so how is it possible that code gets executed inside the while loop before 34?

thank you.
Ankit Garg
Sheriff

Joined: Aug 03, 2008
Posts: 9304
    
  17

That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...


SCJP 6 | SCWCD 5 | Javaranch SCJP FAQ | SCWCD Links
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
Ankit Garg wrote:That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...


oh yea! i never thought of that! i thought the first match has to be a digit followed by zero or more digits. Thanks.
It has been a very busy week here at work, i couldn't concentrate at all while studying, i felt like i'm hitting
Scotty Mitchell
Ranch Hand

Joined: Aug 09, 2011
Posts: 46
I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!
fadi aboona
Ranch Hand

Joined: Apr 25, 2010
Posts: 71
Scotty Mitchell wrote:I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!


thank you
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex confusion