aspose file tools*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes When doing  Matcher start on pattern \d* returned index seems off Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "When doing  Matcher start on pattern \d* returned index seems off" Watch "When doing  Matcher start on pattern \d* returned index seems off" New topic
Author

When doing Matcher start on pattern \d* returned index seems off

Rick Reumann
Ranch Hand

Joined: Apr 03, 2001
Posts: 281
I'm a bit confused why Matcher's start method is returning an index that I would think would be out of bounds on the following text to search...



Result:
0
1
2
3 34
5
6
7
8 Why this index?

According to the spec on the "start" method of Matcher http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html it says: "Returns the start index of the previous match."

To me this doesn't seem to make sense. The previous match of the char "f" has the starting index of 7. I understand it's at position 7,8 but the docs claim that it returns the "start index" (not ending index) of the previous match. Also, if it is supposed to return the ending index then I would think the first thing printed would be a '1' not a 0. I'm sure I'm just missing something simple here.

Thanks for any help.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19065
    
  40

Your regular expression can match a empty string -- in fact, most of the matches are empty matches.

Index 8 is the empty match, at the end of your string.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Matt Russell
Ranch Hand

Joined: Aug 15, 2006
Posts: 165
This link has a good explanation for a similar Regexp question: http://faq.javaranch.com/view?ScjpFaq#kb-regexp


Matt
Inquisition: open-source mock exam simulator for SCJP and SCWCD
athakur athakur
Greenhorn

Joined: Jun 27, 2006
Posts: 8
I got the explation you guys gave above:

I have one doubt though.

if I modify the code to something like this:

public static void main(String [] arg) {
Pattern p = Pattern.compile("\\d*?");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while(b = m.find()) {
System.out.print(m.start() + m.group());
}
}

Change the pattern from greedy to relucant, I got the output: 0123456

Can any one please explain me this ? As it is relectant but it should alleast print 3 and 4 with index 2 and 3 respectively.

Thanks
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19065
    
  40

Can any one please explain me this ? As it is relectant but it should alleast print 3 and 4 with index 2 and 3 respectively.


Reluctant means that it should match the smallest match possible -- and in this case, the smallest possible is an empty match.

Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: When doing Matcher start on pattern \d* returned index seems off