Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!

# Regular expression to check for specific special characters (any repitition) and 0 to 9 numbers

Rama Krishna
Ranch Hand
Posts: 110
Am unable to create the Regular expression to check for specific special characters (any repitition) and 0 to 9 numbers with a total of 15 characters!

Checking number of digits occurring specific number of times is as simple as \d{m,n}
and check for any number of occurrences of specific special characters such as '-', ' ', '(', ')' by creating a character set [\.|\-| \( | \)]?

How can we join these two together to create a regular expression that checks if any of these characters -, ,(, ) may occur any times but there should anywhere between 0 and 9 numbers would probably look something like ^[[\.|\-| \( | \)]? \d{0,9} ]\$? but this is wrong.

SOS
Rama

Henry Wong
author
Marshal
Posts: 21116
78

Sorry, I read what you said three times, and I can't seem to decipher it. Can you give some examples that should match? And some examples that should *not* match?

Henry

Rama Krishna
Ranch Hand
Posts: 110
really sorry about that, I didn't it understand it myself when I got it!

Interested in a Regular Expression pattern that validates a 15 characters string for the following:

1) can have anywhere between 0 to 9 numbers/digits

2) can contain zero or more of the following characters '-', ' ', '(', ')'

Thanks
Rama

Paul Clapham
Sheriff
Posts: 21107
32
As far as I can see, all 15-character strings satisfy those requirements. Example:

ABCDEFGHIJKLMNO

Satisfies rule (1) because it has zero digits.

Satisfies rule (2) because it has zero of those special characters.

Was there supposed to be a rule (3)? Like, "doesn't contain any other characters"?

Rama Krishna
Ranch Hand
Posts: 110
hmm, the third rule is:

3) can only have numbers and following characters '-', ' ', '(', ')'

Rama Krishna
Ranch Hand
Posts: 110
Am I missing any more information or is it really not that simple?

Mike Simmons
Ranch Hand
Posts: 3076
14
So, if you need 15 characters, and no more than 9 of them can be numbers, that means that there must be at least 6 characters that are '-', ' ', '(', ')' . Is that right? Or has there been a misunderstanding here?

The stuff about 0-9 sounds suspicious. Do you mean that digits such as 0, 1, 2, 3... 9 can occur any number of times between 0 and 9? Or did you mean that the digits 0, 1, 2, 3... 9 can occur any number of times?

The way I understand your rules, the following are all valid

"123456789()_()_"
"((((((111111111"
"12-34-5-6-7-8-9"
"---------------"

and these are invalid:
"1234567890-----" (too many digits)
"123456" (not enough characters)

Is this understanding correct?

Piet Verdriet
Ranch Hand
Posts: 266
Rama Krishna wrote:Am I missing any more information or is it really not that simple?

To summarize:

[1] the String must contain 15 characters;
[2] the String can have zero to nine occurrences of digits;
[3] the String can have zero or more of the following characters: '-', ' ', '(' or ')'
[4] the String can only contain characters mentioned in rule [2] and [3]

But rule [3] doesn't make much sense. If the String must contain 15 characters (rule [1]), and there can be a maximum of nine digits (rule [2]) the there must be at least six characters described in rule [3]. Yet you say "zero or more" in rule [3]...

Anyway, the way the rules are now defined, try something like this (untested!):

Rama Krishna
Ranch Hand
Posts: 110
Hi Mike,

You are absolutely right, and seems like the Piet Verdriet's regExp has been tested by me.

Piet,

I would really like to understand how the grouping works i.e., how you got this working.
Could you take a little pain to explain me the regExp please!

I got these from Running the RegExp in EXPRESSO:

Match a suffix but exclude it from the capture:
(?=(?:\\D*\\d){0,9})
where
(?:\\D*\\d){0,9}
meant Match an expression but don't capture it?
where
\D* stands for any character that is not a digit, any number of repititions?
\d is for any digit
what are we capturing here?

[-() \\d]{15}
this is the only thing that I know, which means that any character in the character class/set between 0 and 15 repititions.

Could you explain how you constructed the first part please.

Thanks all of you guys,
Rama

Piet Verdriet
Ranch Hand
Posts: 266
Rama Krishna wrote:...

Piet,

I would really like to understand how the grouping works i.e., how you got this working.
Could you take a little pain to explain me the regExp please!

I got these from Running the RegExp in EXPRESSO:

Match a suffix but exclude it from the capture:
(?=(?:\\D*\\d){0,9})
where
(?:\\D*\\d){0,9}
meant Match an expression but don't capture it?
where
\D* stands for any character that is not a digit, any number of repititions?
\d is for any digit
what are we capturing here?

You're not capturing anything. The "(?=...)" part is called positive look ahead and "\D*\d" matches zero or more characters other than a digit, followed by a digit. So, "(\D*\d){0,9}" will match a string containing between 0 and 9 digits. But my regex won't work in all cases I now realize. It will also match strings where there are more than 9 digits in it, like this: "33333333333333-" (14 digits and a hyphen). To overcome this, you will need to do something like this:

For a thorough explanation, Google for "regex look arounds" and study the normal regex meta characters.

http://www.regular-expressions.info/lookaround.html
http://www.regular-expressions.info/tutorial.html

Rama Krishna wrote:[-() \\d]{15}
this is the only thing that I know, which means that any character in the character class/set between 0 and 15 repititions.
...

No, it matches only strings of length 15, no more and no less.

Rama Krishna
Ranch Hand
Posts: 110
Cool,

The positive lookahead did the trick here. We are checking that the string begins with a positive lookahead for

meaning that in simpler examples as ones below (which I could understand):

is a non-capturing group in that it will match 42 in bug 42

is a zero width positive lookahead that will match var in and in not
where it matches for a character set called 'var' that ends with an '='

and a better example:

where \b is beginning or ending
matches a seven-letter-word that contains 'clip' as

eagerly checks looking ahead if there are exactly 7 characters with a \b meaning inside the bracket meaning ending with a character also.
The initial \b is meant for begin with?
Obviously matches ANY word containing 'clip'.

In the same lines, essentially

can be broken down as eagerly look ahead for 9 characters containing both non-numeric and numeric characters together ending by non-numeric characters. So essentially this is where we are limiting the total of numeric characters to a maximum of 9 only and the remainder can be non-numeric.

What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.

Regards
Rama

Campbell Ritchie
Sheriff
Posts: 48940
60
Now becoming too difficult a question for us beginners. Moving.

Piet Verdriet
Ranch Hand
Posts: 266
Rama Krishna wrote:Cool,

The positive lookahead did the trick here. We are checking that the string begins with a positive lookahead for

Good.

Rama Krishna wrote:meaning that in simpler examples as ones below (which I could understand):

is a non-capturing group in that it will match 42 in bug 42

You probably mean it correct, but your terminology is slightly off. It will match a string like "bug42" and because of the non-capturing group, it will only group "42". There is a big difference between "matching" and "grouping".

Rama Krishna wrote:is a zero width positive lookahead that will match var in and in not
where it matches for a character set called 'var' that ends with an '='

Correct.

Rama Krishna wrote:and a better example:

where \b is beginning or ending
matches a seven-letter-word that contains 'clip' as

eagerly checks looking ahead if there are exactly 7 characters with a \b meaning inside the bracket meaning ending with a character also.
The initial \b is meant for begin with?
Obviously matches ANY word containing 'clip'.

In the same lines, essentially

can be broken down as eagerly look ahead for 9 characters containing both non-numeric and numeric characters together ending by non-numeric characters. So essentially this is where we are limiting the total of numeric characters to a maximum of 9 only and the remainder can be non-numeric.

To be precise, \b matches a position (an empty string) that lies in between a "word" character and a "non-word" character.
But yes, the above is correct.

Rama Krishna wrote:What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.

Regards
Rama

Err, I don't understand why that string should be rejected.

[1] the String must contain 15 characters;
[2] the String can have zero to nine occurrences of digits;
[3] the String can have zero or more of the following characters: '-', ' ', '(' or ')'
[4] the String can only contain characters mentioned in rule [2] and [3]

AFAIK, the string "-1234----56789-" complies with all four rules.

Rama Krishna
Ranch Hand
Posts: 110
Sorry, I have been unable to communicate effectively and I will be working on it.

I did not mean to say that the test case is invalid and should not pass! Instead, I meant to ask you was how the regular expression satisfies/passes the test case: -1234----56789-

In the example, assuming that we are matching the whole string from the beginning to the end as in pattern.matches(completestring) and not partial matches:

I could understand that the first \b is the beginning, the last \b is the ending in the same lines as the regular expression

So ^(?=\w{7}\b) would do a eager lookup for 7 characters with the \b at the end saying that it has to end there too!

What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.

broken down as

eagerly matches up to a maximum of 9 numeric characters or a total of 9 characters (containing both numeric and non-numeric characters) followed by any non-numeric characters. Whereas the test case -1234----56789-

Piet Verdriet
Ranch Hand
Posts: 266
Rama Krishna wrote:...

broken down as

eagerly matches up to a maximum of 9 numeric characters or a total of 9 characters (containing both numeric and non-numeric characters) followed by any non-numeric characters. Whereas the test case -1234----56789-

No, that is not correct. The regex:

will match any string that has less than 10 digits in it.
Note that there is a star behind both the \D classes. And the \d must be present {0,9} times.
In short:
- it will match an empty string (it has zero \D and it has zero \d)
- it will match a string of arbitrary length (\D*) containing no digits (\d{0,9})
- it will NOT match a string with 10 digits (or more)

Rama Krishna
Ranch Hand
Posts: 110

is limiting the whole word to have only 7 characters, i.e.,

is applied on the complete word because of the \b at the end causing a boundary

so the complete regExp enforces that it cannot have a word of size other than 7 and should contain 'clip'. In the same lines, I was thinking that

matches up to a maximum of 9 digits or a total of 9 characters (containing both digit and non-digit characters) followed by any non-digit characters which is same as what you said:

"will match any string that has less than 10 digits in it.
In short:
- it will match an empty string (it has zero \D and it has zero \d)
- it will match a string of arbitrary length (\D*) containing no digits (\d{0,9})
- it will NOT match a string with 10 digits (or more)
"

But, the test case has a total of 15 characters
-1234----56789-
so I could not understand how it is imposing this less than 10 digit limitation on the complete 15 characters.

Regards
Krishna

Piet Verdriet
Ranch Hand
Posts: 266
Rama Krishna wrote:...

so I could not understand how it is imposing this less than 10 digit limitation on the complete 15 characters.

Sorry, although I have tried, it seems I am not able to explain this to you.

Good luck though.