This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes Pattern Matching Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "Pattern Matching" Watch "Pattern Matching" New topic
Author

Pattern Matching

Abhi vijay
Ranch Hand

Joined: Sep 16, 2008
Posts: 509
Source: Inquisition


Here \w.*?\d means A letter or a dogit or an underscore followed by zero occurence of ., followed by a digit.

So the answer should be 7 s8
10 93

But the answer is
0 asf jgds8
10 93


Can anyone explain what is happening?
Dhruv Goel
Greenhorn

Joined: Jan 08, 2009
Posts: 9
hey
this is the reluctant operator which simply says that that the compiler will read the source from the beginning rather than from the end in case of greedy ......
So the regex engine searches for the best possible minimum match reading frm the left..

this statement rather says that search for first occurence of the pattern i.e

"any no of characters or digits or _ followed by an integer from left reading the source 1 at a time"


scjp 1.5 ------> 100%
Punit Singh
Ranch Hand

Joined: Oct 16, 2008
Posts: 952
1) first is : \\w : so regex engine starts from left find \\w that is "a" satisfied

2) second is : .*? : zero or more occurrence of "." any character that is
also satisfied by zero occurrence any character, no need of "s" right now.

3) third is : \\d: and after "a" engine finds s that is not \\d.

4) so regex has an option to come back to case 2 ".*?" and go for more
than zero occurrence of any character, so it will read "s".

5) than for \\d it will read "f", not satisfied. so go back.

6) for ".*?" it matches now "f". for \\d it will read space character
not satisfied so go back for .*?

7) You can see how regex engine is reluctant here and reading just
zero or one character for .*? and try to match next character with \\d.
\\d fails so it comes back and read a character, than again matches
next character with \\d.

8) All this happening due to boundary requirement of \\d. So remove \\d
and try to run with "\\w.*?", try to interpret engine working, and
ask any doubts.

9) try with "\\w.*\\d" also.

If you got the working, then try to tell me what happenings.


SCJP 6
Abhi vijay
Ranch Hand

Joined: Sep 16, 2008
Posts: 509
Thanks Punit, that was a great Explanation.
But isn't .*? mean exactly zero occurrence of "." ?

.* means zero or more occurrence , isn't it??? In case of .*?, as you said the regex engine checks one character at a time starting from left. What about .*, how does the engine check??
alex sandoval
Greenhorn

Joined: Dec 08, 2008
Posts: 26
Abhi vijay wrote:Thanks Punit, that was a great Explanation.
But isn't .*? mean exactly zero occurrence of "." ?

.* means zero or more occurrence , isn't it??? In case of .*?, as you said the regex engine checks one character at a time starting from left. What about .*, how does the engine check??


Remember that * is a greedy quantifier . Meaning it will search the widest possible of match within the string. In this case .* will give you the answer of "asf jgds8 93", but .*? will give you "asf jgds8" which is the minimum match that is opposed to greedy.
Punit Singh
Ranch Hand

Joined: Oct 16, 2008
Posts: 952
Abhi vijay wrote:Thanks Punit, that was a great Explanation.
But isn't .*? mean exactly zero occurrence of "." ?

.* means zero or more occurrence , isn't it??? In case of .*?, as you said the regex engine checks one character at a time starting from left. What about .*, how does the engine check??


.* starts searching from right side. And .* is very greedy, always try to match as max as possible.



It will start matching from 9 then will come to 8, 7, 6, 5 up to 1.

While *? is reluctant, it will always try to avoid any matches, you can say it is somewhat lazy, it does not like any work, so it always prefers as minimum as matching possible.



it will start matching from left side means from 1, and it has no boundary here mean nothing like this (\\d*?\\d), so \\d* will show you its laziness, and start matching only zero characters, you can run this and see its output.



Now we are giving a boundary \\d, so \\d is compulsory now, so engine will match as less as possible for \\d*?, zero char here, but one match for \\d.



Now we are giving two boundary for \\w*?, one is \\d, so engine will match 1 with \\d and for \\w*? it matches zero character then comes to last boundary that is \\d, but "a" does not match to \\d, so engine is being forced here to match more than zero character for \\w*?. Here engine is forced to match more than one character for \\w*? until it finds any \\d.

If you remove last 9 from source and last \\d from pattern, than this \\w*? will show you its real nature that is its laziness.


But if you remove first \\d that is starting boundary than you can see its zero matching style.


You have to run all this code to completely understand what is going on...
Abhi vijay
Ranch Hand

Joined: Sep 16, 2008
Posts: 509
Thanks a lot Punit, I clearly understood the concept.
Punit Singh
Ranch Hand

Joined: Oct 16, 2008
Posts: 952
Great
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Pattern Matching
 
Similar Threads
Validate ID through RegEx
Regex: find() method
Doubt Abt Regx
pattern matching
regex