• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regex

 
podonga poron
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


"\\d" Search for digits i understand that ..

so if i apply "\\d" to a88abc i will get "12" i understand this !

but if i apply "\\d*" to a88abc i get "013456" WHY ??

and if i apply "\\d*" to ab8abc i get "0123456" WHY ??

same with "\\d?"

The HORRIBLE book (i hate it) says

? is greedy, ?? is reluctant, for zero or one
* is greedy, *? is reluctant, for zero or more
+ is greedy, +? is reluctant, for one or more

WHAT THE **** MEANS THAT !?? plus english is not my native language, im from spain, im doing my best but i can't understand this shit

please if you can help me i will be very grateful !
 
Taariq San
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to learn to relax more than you need to learn regex.

Anyhat, you said it yourself that

? is greedy, ?? is reluctant, for zero or one
* is greedy, *? is reluctant, for zero or more
+ is greedy, +? is reluctant, for one or more


so taking your examples


but if i apply "\\d*" to a88abc i get "013456" WHY ??

Because at index 0 there are 'zero or more digits, ie 'a'.
At index 1 there's zero or more digits, ie '88'
At index 3 there's zero or more digits, ie 'a'
and so on.

I'm sure if you calm down a little you can work out the rest of your examples.
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by podonga poron:
but if i apply "\\d*" to a88abc i get "013456" WHY ??

At index 0 it matches the empty string preceding the first 'a'.
At index 1 it matches "88".
At index 3 it matches the empty string between '8' and 'a'.
At index 4 it matches the empty string between 'a' and 'b'.
At index 5 it matches the empty string between 'b' and 'c'.
At index 6 it matches the empty string following 'c'.

The parts that are hardest to understand are:

At index 3: it just finished matching two digits; why does it match again at the index where that match ended?

Answer: The regex is allowed to match zero characters, so it will always match at every position where it's tried.

At index 6: the string is only six characters long, which means the last valid index is 5; how can it match something at index 6?

Answer: it isn't matching a character, it's matching the nothing after the last character. It might help if you think of it as being between the last character and the end of the string, since regexes let you match the end of a string with the '$' metacharacter.

The HORRIBLE book (i hate it) says


I don't know about the book, but I agree that this part of it is horrible. This question is constantly being asked here because the authors did such terrible job of explaining it.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic