Two Laptop Bag*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "regex" Watch "regex" New topic
Author

regex

Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

I am not able to understand the problems on regex(greedy quantifiers). for instance, consider the following problem:



how does it produce the output 1 b345 f0

Can anybody PLEASE help me with this!


OCPJP
Kevin Workman
Ranch Hand

Joined: Sep 28, 2010
Posts: 151
Komal Arora wrote:how does it produce the output 1 b345 f0


What did you expect it to produce?
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

I did not understand how the digit 5 came in the answer. In the book it is writen that greedy quantifiers scan the entire source data and then they move backwards finding the appropriate match. I always get confused in how it does that !
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

oh no wait, got it

Dammit!, lack of concentration!
Kevin Workman
Ranch Hand

Joined: Sep 28, 2010
Posts: 151
Komal Arora wrote:oh no wait, got it


Cool. You might want to offer what you figured out, in case anybody else has a similar problem. Your post might come up on a google search that somebody else finds in the future.
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

we need to find a match according to the pattern [a-f]\d+ i.e, an alphabet ranging from a to f , then one or more(+ quantifier) digits in a row.
one such match is found at position 1 (b34) and the next at position 5(f0) , and hence the output 1b34 5f0
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1756
    
    7

The pattern will match a sequence of characters which consists of exactly one occurence of characters a, b, c, d, e or f, followed by at least one digit.
Now, given the String ab34ef0, which sub-sequences match this pattern?

Let's start at index 0 and work our way thru the sequence:

0 - No match here, a is a matching character, but it should be followed by one or more digits, and b certainly isn't that.

1 - Found a match! b is a matching cahracter, followed by 3 which is a digit! So are we done with this match? Not quite, because the greedy + quantifier will try to match as much of the sequence as it can, and the next character in sequence is 4, which is also a digit. Now we're done with this match, because the next character in sequence is e, which would break the pattern. Right, so now we print the starting position of this match (Matcher.start()) 1 and the match itself (Matcher.group()) b34 to the console, separated by a single white space, and we don't add a line separator, because we made a call to print(), not println().

4 - Wait a minute, isn't 1 ususally followed by 2? Well yes, but the previous match has 'consumed' the sequence up to and including the index position where that match ended: 3. Ok, so starting at position 4 we find e followed by f which doesn't match the pattern.

5 - Found another one! The sub-sequence f0 matches the pattern quite nicely. So lets print that to the console as well: 5 f0. Now, because we're using print() instead of println the output will be appended directly to the previous output.

Now we're done - the entire sequence has been consumed - and the output reads 1 b345 f0.
And that's the way the cookie crumbles

Edit: Oh crud, typing up my reply took longer than I thought, and the question has been answered in the mean time. Oh well...


Build a man a fire, and he'll be warm for a day. Set a man on fire, and he'll be warm for the rest of his life.
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

Wow, that was a very nice explaination

i was just going through regex questions where i found this one:



and the command line:

java Regex2 "\d*" ab34ef

to this the output is : 01234456
Now how did that come?
please please answer this too!
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1756
    
    7

Well, this has to do with zero-length matching, as described in this tutorial.
I suggest you read that first, and then come back to give this one a try yourself

Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

OKAY. so * quantifier says "zero or more occurences"
our string is ab34ef

at index 0 - there is a zero occurence, hence 0 is printed.

at index 1 - there is again a zero occurrence and hence 1 is printed

at index 2 - there is an occurrence of the group 34, hence 234 is printed

at index 4 - zero occurrence, we print 4

at index 5 - zero occurrence, we print 5.

Where did 6 come from then?
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1756
    
    7

That would be the zero-length match after the last character in the sequence
I didn't want to spoil anything before, but here's the JavaRanch FAQ entry about this: http://faq.javaranch.com/java/ScjpFaq#kb-regexp
Have a look.
Neha Daga
Ranch Hand

Joined: Oct 30, 2009
Posts: 504
Read the chapter in K&B carefully it says that a quantifier will check it for the position after the last character in the string. That is the position next to last character and in this case it matches the pattern to me matched which is 0 or more .

well I am late


SCJP 1.6 96%
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

hey finally got the problem Thanks Jelle!

And Neha, does this hold for all the three quantifiers? do ALL of them look at the position after the last character of the string?
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1756
    
    7

Let's answer that with a question: is a zero-length match a possibility for all three quantifiers?
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

Nope, only for * and ?
So that means only * and ? will look to the position after the string?
Jelle Klap
Bartender

Joined: Mar 10, 2008
Posts: 1756
    
    7

Bingo.
Komal Arora
Ranch Hand

Joined: Sep 30, 2010
Posts: 91

YAY

You are a good teacher
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex
 
Similar Threads
Zero Length Match
RegEx
Pattern Matching
Topic: Regex problem for pattern and grouping
Illegal Escape Character