aspose file tools*
The moose likes Beginning Java and the fly likes Regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Regex" Watch "Regex" New topic
Author

Regex

podonga poron
Ranch Hand

Joined: May 12, 2008
Posts: 55


"\\d" Search for digits i understand that ..

so if i apply "\\d" to a88abc i will get "12" i understand this !

but if i apply "\\d*" to a88abc i get "013456" WHY ??

and if i apply "\\d*" to ab8abc i get "0123456" WHY ??

same with "\\d?"

The HORRIBLE book (i hate it) says

? is greedy, ?? is reluctant, for zero or one
* is greedy, *? is reluctant, for zero or more
+ is greedy, +? is reluctant, for one or more

WHAT THE **** MEANS THAT !?? plus english is not my native language, im from spain, im doing my best but i can't understand this shit

please if you can help me i will be very grateful !
Taariq San
Ranch Hand

Joined: Nov 20, 2007
Posts: 192
You need to learn to relax more than you need to learn regex.

Anyhat, you said it yourself that

? is greedy, ?? is reluctant, for zero or one
* is greedy, *? is reluctant, for zero or more
+ is greedy, +? is reluctant, for one or more


so taking your examples


but if i apply "\\d*" to a88abc i get "013456" WHY ??

Because at index 0 there are 'zero or more digits, ie 'a'.
At index 1 there's zero or more digits, ie '88'
At index 3 there's zero or more digits, ie 'a'
and so on.

I'm sure if you calm down a little you can work out the rest of your examples.
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Originally posted by podonga poron:
but if i apply "\\d*" to a88abc i get "013456" WHY ??

At index 0 it matches the empty string preceding the first 'a'.
At index 1 it matches "88".
At index 3 it matches the empty string between '8' and 'a'.
At index 4 it matches the empty string between 'a' and 'b'.
At index 5 it matches the empty string between 'b' and 'c'.
At index 6 it matches the empty string following 'c'.

The parts that are hardest to understand are:

At index 3: it just finished matching two digits; why does it match again at the index where that match ended?

Answer: The regex is allowed to match zero characters, so it will always match at every position where it's tried.

At index 6: the string is only six characters long, which means the last valid index is 5; how can it match something at index 6?

Answer: it isn't matching a character, it's matching the nothing after the last character. It might help if you think of it as being between the last character and the end of the string, since regexes let you match the end of a string with the '$' metacharacter.

The HORRIBLE book (i hate it) says


I don't know about the book, but I agree that this part of it is horrible. This question is constantly being asked here because the authors did such terrible job of explaining it.
 
Don't get me started about those stupid light bulbs.
 
subject: Regex