File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex confusion

 
David G Harris
Greenhorn
Posts: 6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In Chapter 6 of the SCJP book, Self Test problem 1 says:

Given:


And the command line:

java Regex2 "\d*" ab34ef

What is the result?

A.234

B.334

C.2334

D.0123456

E.01234456

F.12334567

G.Compilation fails


The answer is E and I understand up to the point of there being a '6' at the end. of the output. I was thinking in terms of indexes so I don't understand why the output doesn't end at 5.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's a FAQ: http://www.coderanch.com/how-to/java/SCJP-FAQ#kb-regexp
 
fadi aboona
Ranch Hand
Posts: 71
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I have a question about this code and found this post through search so i thought of asking here.
the m.find() method returns true if it gets a match, so how is it possible that code gets executed inside the while loop before 34?

thank you.
 
Ankit Garg
Sheriff
Posts: 9495
22
Android Google Web Toolkit Hibernate IntelliJ IDE Java Spring
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...
 
fadi aboona
Ranch Hand
Posts: 71
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ankit Garg wrote:That's because the regular expression did find a match. Since the regular expression is looking for zero or more occurrences of a digit, it finds a blank match at each index in the string...


oh yea! i never thought of that! i thought the first match has to be a digit followed by zero or more digits. Thanks.
It has been a very busy week here at work, i couldn't concentrate at all while studying, i felt like i'm hitting
 
Scotty Mitchell
Ranch Hand
Posts: 46
  • 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!
 
fadi aboona
Ranch Hand
Posts: 71
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Scotty Mitchell wrote:I had posted this in reponse to another post regarding the same thing, so I thought id just copy it here as well.

The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.
m.start() Matcher class method definition: Returns the start index of the previous match.
m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!


thank you
 
Don't get me started about those stupid light bulbs.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic