This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
Your regex is incorrect, and does not match your description of it. It is missing the ending > of the start tag. You can verify by adding some more capturing groups and then checking the results:
0: <boo>bold (everything)
1: b ([A-Z][A-Z0-9]*)
2: o (non optional [^>])
3: o>bold (everything up to )
A quick fix in the regex: <([A-Z][A-Z0-9]*)[^>]*>.*?</\\1>
The [^>] is made optional by requiring it 0 or more times, and the closing > is added. If I keep the same capturing groups (around [^>]* and around .*?) the output is then this:
1: b (because you are looking for the end tag )
2: oo ([^>]*)
3: bold (.*?)
Sorry iam not getting it...Actually please see the link and the topic i mentioned in my first post. Also, actually iam more interested in knowing how does the Regex Engine works in the above case and not exactly on the output. Actually, in my first post, i have tried to put down my understanding on token by token basis. I know, it may not be completely correct but its not completely wrong either...
I would like to know on the above lines i.e how the regex works........Thanks for all the efforts put by you in explaining, but if someone could explain taking every token into account, then it may be more helpful so that i can zero in on my error in understanding....
In the site i mentioned in my first post, they say... iam unable to get it
Let's take the regex <([A-Z][A-Z0-9]*)[^>]*>.*?</\1> without the word boundary and look inside the regex engine at the point where \1 fails the first time. First, .*? continues to expand until it has reached the end of the string, and </\1> has failed to match each time .*? matched one more character.
Then the regex engine backtracks into the capturing group. [A-Z0-9]* has matched oo, but would just as happily match o or nothing at all. When backtracking, [A-Z0-9]* is forced to give up one character. The regex engine continues, exiting the capturing group a second time. Since [A-Z][A-Z0-9]* has now matched bo, that is what is stored into the capturing group, overwriting boo that was stored before. [^>]* matches the second o in the opening tag. >.*?</ matches >bold<. \1 fails again.
The regex engine does all the same backtracking once more, until [A-Z0-9]* is forced to give up another character, causing it to match nothing, which the star allows. The capturing group now stores just b. [^>]* now matches oo. >.*?</ once again matches >bold<. \1 now succeeds, as does > and an overall match is found. But not the one we wanted.