• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regular expression to check for specific special characters (any repitition) and 0 to 9 numbers

 
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Am unable to create the Regular expression to check for specific special characters (any repitition) and 0 to 9 numbers with a total of 15 characters!

Checking number of digits occurring specific number of times is as simple as \d{m,n}
and check for any number of occurrences of specific special characters such as '-', ' ', '(', ')' by creating a character set [\.|\-| \( | \)]?

How can we join these two together to create a regular expression that checks if any of these characters -, ,(, ) may occur any times but there should anywhere between 0 and 9 numbers would probably look something like ^[[\.|\-| \( | \)]? \d{0,9} ]$? but this is wrong.

SOS
Rama
 
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sorry, I read what you said three times, and I can't seem to decipher it. Can you give some examples that should match? And some examples that should *not* match?

Henry
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
really sorry about that, I didn't it understand it myself when I got it!

Interested in a Regular Expression pattern that validates a 15 characters string for the following:

1) can have anywhere between 0 to 9 numbers/digits

2) can contain zero or more of the following characters '-', ' ', '(', ')'

Thanks
Rama
 
Marshal
Posts: 28263
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As far as I can see, all 15-character strings satisfy those requirements. Example:

ABCDEFGHIJKLMNO

Satisfies rule (1) because it has zero digits.

Satisfies rule (2) because it has zero of those special characters.

Was there supposed to be a rule (3)? Like, "doesn't contain any other characters"?
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hmm, the third rule is:

3) can only have numbers and following characters '-', ' ', '(', ')'
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Am I missing any more information or is it really not that simple?
 
Master Rancher
Posts: 4921
74
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So, if you need 15 characters, and no more than 9 of them can be numbers, that means that there must be at least 6 characters that are '-', ' ', '(', ')' . Is that right? Or has there been a misunderstanding here?

The stuff about 0-9 sounds suspicious. Do you mean that digits such as 0, 1, 2, 3... 9 can occur any number of times between 0 and 9? Or did you mean that the digits 0, 1, 2, 3... 9 can occur any number of times?

The way I understand your rules, the following are all valid

"123456789()_()_"
"((((((111111111"
"12-34-5-6-7-8-9"
"---------------"

and these are invalid:
"1234567890-----" (too many digits)
"123456" (not enough characters)

Is this understanding correct?
 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rama Krishna wrote:Am I missing any more information or is it really not that simple?



To summarize:

[1] the String must contain 15 characters;
[2] the String can have zero to nine occurrences of digits;
[3] the String can have zero or more of the following characters: '-', ' ', '(' or ')'
[4] the String can only contain characters mentioned in rule [2] and [3]

But rule [3] doesn't make much sense. If the String must contain 15 characters (rule [1]), and there can be a maximum of nine digits (rule [2]) the there must be at least six characters described in rule [3]. Yet you say "zero or more" in rule [3]...

Anyway, the way the rules are now defined, try something like this (untested!):

 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Mike,

You are absolutely right, and seems like the Piet Verdriet's regExp has been tested by me.

Piet,

I would really like to understand how the grouping works i.e., how you got this working.
Could you take a little pain to explain me the regExp please!

I got these from Running the RegExp in EXPRESSO:

Match a suffix but exclude it from the capture:
(?=(?:\\D*\\d){0,9})
where
(?:\\D*\\d){0,9}
meant Match an expression but don't capture it?
where
\D* stands for any character that is not a digit, any number of repititions?
\d is for any digit
what are we capturing here?

[-() \\d]{15}
this is the only thing that I know, which means that any character in the character class/set between 0 and 15 repititions.

Could you explain how you constructed the first part please.

Thanks all of you guys,
Rama




 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rama Krishna wrote:...

Piet,

I would really like to understand how the grouping works i.e., how you got this working.
Could you take a little pain to explain me the regExp please!

I got these from Running the RegExp in EXPRESSO:

Match a suffix but exclude it from the capture:
(?=(?:\\D*\\d){0,9})
where
(?:\\D*\\d){0,9}
meant Match an expression but don't capture it?
where
\D* stands for any character that is not a digit, any number of repititions?
\d is for any digit
what are we capturing here?



You're not capturing anything. The "(?=...)" part is called positive look ahead and "\D*\d" matches zero or more characters other than a digit, followed by a digit. So, "(\D*\d){0,9}" will match a string containing between 0 and 9 digits. But my regex won't work in all cases I now realize. It will also match strings where there are more than 9 digits in it, like this: "33333333333333-" (14 digits and a hyphen). To overcome this, you will need to do something like this:



For a thorough explanation, Google for "regex look arounds" and study the normal regex meta characters.

Or, better yet, read these:
http://www.regular-expressions.info/lookaround.html
http://www.regular-expressions.info/tutorial.html

Rama Krishna wrote:[-() \\d]{15}
this is the only thing that I know, which means that any character in the character class/set between 0 and 15 repititions.
...



No, it matches only strings of length 15, no more and no less.
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Cool,

The positive lookahead did the trick here. We are checking that the string begins with a positive lookahead for


meaning that in simpler examples as ones below (which I could understand):

is a non-capturing group in that it will match 42 in bug 42

is a zero width positive lookahead that will match var in and in not
where it matches for a character set called 'var' that ends with an '='

and a better example:

where \b is beginning or ending
matches a seven-letter-word that contains 'clip' as

eagerly checks looking ahead if there are exactly 7 characters with a \b meaning inside the bracket meaning ending with a character also.
The initial \b is meant for begin with?
Obviously matches ANY word containing 'clip'.

In the same lines, essentially

can be broken down as eagerly look ahead for 9 characters containing both non-numeric and numeric characters together ending by non-numeric characters. So essentially this is where we are limiting the total of numeric characters to a maximum of 9 only and the remainder can be non-numeric.

What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.


Regards
Rama
 
Marshal
Posts: 79468
379
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Now becoming too difficult a question for us beginners. Moving.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rama Krishna wrote:Cool,

The positive lookahead did the trick here. We are checking that the string begins with a positive lookahead for



Good.

Rama Krishna wrote:meaning that in simpler examples as ones below (which I could understand):

is a non-capturing group in that it will match 42 in bug 42



You probably mean it correct, but your terminology is slightly off. It will match a string like "bug42" and because of the non-capturing group, it will only group "42". There is a big difference between "matching" and "grouping".

Rama Krishna wrote:is a zero width positive lookahead that will match var in and in not
where it matches for a character set called 'var' that ends with an '='



Correct.

Rama Krishna wrote:and a better example:

where \b is beginning or ending
matches a seven-letter-word that contains 'clip' as

eagerly checks looking ahead if there are exactly 7 characters with a \b meaning inside the bracket meaning ending with a character also.
The initial \b is meant for begin with?
Obviously matches ANY word containing 'clip'.

In the same lines, essentially

can be broken down as eagerly look ahead for 9 characters containing both non-numeric and numeric characters together ending by non-numeric characters. So essentially this is where we are limiting the total of numeric characters to a maximum of 9 only and the remainder can be non-numeric.



To be precise, \b matches a position (an empty string) that lies in between a "word" character and a "non-word" character.
But yes, the above is correct.

Rama Krishna wrote:What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.


Regards
Rama



Err, I don't understand why that string should be rejected.

Here are your rules again:

[1] the String must contain 15 characters;
[2] the String can have zero to nine occurrences of digits;
[3] the String can have zero or more of the following characters: '-', ' ', '(' or ')'
[4] the String can only contain characters mentioned in rule [2] and [3]


AFAIK, the string "-1234----56789-" complies with all four rules.
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry, I have been unable to communicate effectively and I will be working on it.

I did not mean to say that the test case is invalid and should not pass! Instead, I meant to ask you was how the regular expression satisfies/passes the test case: -1234----56789-

In the example, assuming that we are matching the whole string from the beginning to the end as in pattern.matches(completestring) and not partial matches:

I could understand that the first \b is the beginning, the last \b is the ending in the same lines as the regular expression

So ^(?=\w{7}\b) would do a eager lookup for 7 characters with the \b at the end saying that it has to end there too!


What I do not understand is how the below test case passes:
-1234----56789-

because we did mention that we have a positive look ahead for 9 numeric characters (can be mixed with non-numeric) characters, followed by non-numeric characters only!

I am using java pattern.matches(string) to test if the regular expression matches.

But overall, the string should only have a total of 15 characters from these special characters set which includes numeric characters.






broken down as

eagerly matches up to a maximum of 9 numeric characters or a total of 9 characters (containing both numeric and non-numeric characters) followed by any non-numeric characters. Whereas the test case -1234----56789-


 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rama Krishna wrote:...



broken down as

eagerly matches up to a maximum of 9 numeric characters or a total of 9 characters (containing both numeric and non-numeric characters) followed by any non-numeric characters. Whereas the test case -1234----56789-



No, that is not correct. The regex:



will match any string that has less than 10 digits in it.
Note that there is a star behind both the \D classes. And the \d must be present {0,9} times.
In short:
- it will match an empty string (it has zero \D and it has zero \d)
- it will match a string of arbitrary length (\D*) containing no digits (\d{0,9})
- it will NOT match a string with 10 digits (or more)
 
Rama Krishna
Ranch Hand
Posts: 110
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


is limiting the whole word to have only 7 characters, i.e.,

is applied on the complete word because of the \b at the end causing a boundary

so the complete regExp enforces that it cannot have a word of size other than 7 and should contain 'clip'. In the same lines, I was thinking that



matches up to a maximum of 9 digits or a total of 9 characters (containing both digit and non-digit characters) followed by any non-digit characters which is same as what you said:

"will match any string that has less than 10 digits in it.
In short:
- it will match an empty string (it has zero \D and it has zero \d)
- it will match a string of arbitrary length (\D*) containing no digits (\d{0,9})
- it will NOT match a string with 10 digits (or more)
"



But, the test case has a total of 15 characters
-1234----56789-
so I could not understand how it is imposing this less than 10 digit limitation on the complete 15 characters.

Regards
Krishna

 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rama Krishna wrote:...

so I could not understand how it is imposing this less than 10 digit limitation on the complete 15 characters.



Sorry, although I have tried, it seems I am not able to explain this to you.

Good luck though.
reply
    Bookmark Topic Watch Topic
  • New Topic