aspose file tools*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Simple Regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Simple Regex" Watch "Simple Regex" New topic
Author

Simple Regex

Tim Eapen
Greenhorn

Joined: May 28, 2006
Posts: 22
Hello everybody:

Here is some simple code:



Here is the output:



I can understand this. I don't understand the following.

Now I will change the pattern object so that it is a reluctant match as follows: Pattern reluctant = Pattern.compile("\\d*?")

When I do a match on the same input I get the following output:



It seems as if the pattern is matching the empty string on every character it encounters for reluctant matching. Why? This doesn't seem intuitive to me.

Tim
Franz Fountain
Ranch Hand

Joined: Nov 15, 2006
Posts: 58
My guess is that the reluctant qualifier is finding the shortest string that will match the pattern. That is an empty string since "\\d*?" can match 0 or more characters. It seems that no matter what the input string, "\\d*?" will always match on every empty string. I guess the *? qualifier only makes sense when it is followed by something. For example "\\d*?6" in this example would make sense.

This is a good question. I hope someone with more experience with regex will give a more definitive answer.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18139
    
  39

It seems as if the pattern is matching the empty string on every character it encounters for reluctant matching. Why? This doesn't seem intuitive to me.


As already mentioned, the reluctant pattern you created will match a blank string (zero length match). What doesn't seem intuitive is what happens afterwards. To understand that, you need to understand what the find() method does -- here is the relevant quote from JavaDoc.

This method starts at the beginning of the input sequence or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.


Normally, the find() method will start the search for the next match at the end of the previous match. The exception is the zero length match. In that case, it will start at the next character. This is why it is matching "on every character it encounters".

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Simple Regex
 
Similar Threads
Chap:6 selftest q: 1 but in argument little more
Regular expression confusion
Reluctant Quantifier
Regex: Need clarification on two issues
Regular expression to take integers out of a string