narain ashwin wrote:
But it is not matching any numbers, can onyone explain what the pattern "\\d*?" will match
Patterns \\d* and \\d*? both matches 0 or more digits.
But \\d* is 'greedy' and \\d*? is 'lazy'.
'Greedy' means that regex engine first consumes as many charachters from the input line as possible and tries to match the whole line to the pattern,
if it doesn't match then the engine 'backtracks' - cuts one char from the end and tries again ... and recursively again and again
until it find some match.
'Lazy pattern' consumes as less characters as possible - 0 at start - and tries to match.
Consider input line = '123a' and patterns \\d* and \\d*?
\\d* at first consumes the whole line - '123a' - it doesn't match.
Then cuts last character off - the next attempt is with line '123'. This matches with the pattern and result is 123.
\\d*? at first consumes as less as possible - the empty string '' before '1'. This matches with the pattern and result is ''
(\\d*? means 0 or more, and 0 is simply an empty string).
Lazy operator could be useful for example if you want to match the first tag <> from html string,
look at this:
<.*> greedy pattern doesn't work because we want only first <> tag
<.*?> lazy pattern match exactly what we want
Your mother was a hamster and your father was a tiny ad: