File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Parsing, Tokenizing and Formatting Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Parsing, Tokenizing and Formatting" Watch "Parsing, Tokenizing and Formatting" New topic
Author

Parsing, Tokenizing and Formatting

Chidimma Juliana
Greenhorn

Joined: Jul 14, 2011
Posts: 18
Hello,

Please, could someone explain to me how to resolve this regex problem on Self test of Chapter 6 No1 of K & B. The book explaination is not clear to me.



And the command line:

java Regex2 "\d*" ab34ef

The answer on the book is 01234456.

Thank you.

Chidimma.
Scotty Mitchell
Ranch Hand

Joined: Aug 09, 2011
Posts: 46
The regex pattern \d* matches ZERO or MORE digits. The key thing to note is the ZERO possiblity.

m.find() Matcher class method definition: Attempts to find the next subsequence of the input sequence that matches the pattern.

m.start() Matcher class method definition: Returns the start index of the previous match.

m.group() Matcher class method definition: Returns the input subsequence matched by the previous match.


Upon the first iteration (b = m.find()) is set true because the * matches on ZERO digits found.

Imagine the string as looking like this |0|a|1|b|2|3|3|4|4|e|5|f|6|
Where the bold characters are part of the string, and the numbers indicate an index

m.start() = 0 (The "space" behind the "a", I guess you could say. The darn * gets that I believe)
m.group() = ""

m.start() = 1
m.group() = ""

m.start() = 2
m.group() = "34" MATCH OCCURED!

The skip of the index occurs because the match covered the |3| index. Say it was "345" in the string then group would be "345" and m.start() would be 5 on the next iteration.

m.start() = 4
m.group() = ""

m.start() = 5
m.group() = ""

m.start() = 6
m.group() = ""

I believe this is how it works anyway...I ran a quick test with the same pattern trying to match "" and it came back with m.start() with index 0!
O. Ziggy
Ranch Hand

Joined: Oct 02, 2005
Posts: 430

I actually got this wrong because i thought compilation would fail because of the command line.



i thought that the \d would have required an additional escape character. i.e. it should have been \\d
Scotty Mitchell
Ranch Hand

Joined: Aug 09, 2011
Posts: 46
O. Ziggy wrote:I actually got this wrong because i thought compilation would fail because of the command line.



i thought that the \d would have required an additional escape character. i.e. it should have been \\d



I think that would only be true if you tried to create the pattern within whatever editor you are using...though I didnt go test that. I'm assuming the OCPJP test people would not be as harsh to throw in a compiler error question based on syntax like that. Especially, since for regex stuff you only need to know the basics, but who knows. I haven't taken the exam yet...A WEEK or SO TO GO!!! ahh
Ananya Raval
Greenhorn

Joined: Aug 27, 2011
Posts: 5

m.start() = 6
m.group() = ""


Do we have to count the space after the input String
|0|a|1|b|2|3|3|4|4|e|5|f|6|
gets over ??
Is it considered everytime ?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing, Tokenizing and Formatting