Win a copy of Terraform in Action this week in the Cloud forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

Regular Expression

 
Ranch Hand
Posts: 33
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Anyone know How this Expression work Specially ( \\ backslack one)

(^[A-Za-z][A-Za-z\\'\\-]+([\\A-Za-z][A-Za-z\\'\\-]+)*",pattern,CASE_INSENSITIVE);
 
Saloon Keeper
Posts: 7185
166
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There's likely an easier way to express the intent, whatever it is.

Instead of "A-Za-z" it should use "\p{Alpha}", or possibly "\p{L}" if non-ASCII letters are possible.

' is not a special character in regexps, so \\' could just read '.

Same for \\-, as long as it's at the end of the character class.

The \\A is suspect, it should probably just read A.

If you want to play with a Java regexps, try a site like https://www.regexplanet.com/advanced/java/index.html
 
Saloon Keeper
Posts: 13430
303
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In addition to all that Tim said, it makes no sense to use both uppercase and lowercase letters in your character classes if you're going to use the CASE_INSENSITIVE option.

If we simplify your regex a little bit, we end up with this (I added spaces for clarity, you can ignore them using the COMMENTS option):

We can further simplify this regex, because "X (X)*" is the same thing as just "(X)+". Here, "X" is "[a-z][a-z ' -]+":

This regular expressions means:

The beginning of the string, followed by a one or more groups of (a letter followed by one or more letters, quotes or dashes).

If we use Tim's suggestion to use the Unicode "letter" category instead of just ASCII letters, the final result looks like this:
 
Stephan van Hulst
Saloon Keeper
Posts: 13430
303
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just realized you can simplify the regular expression even further.

Let's say you have the string "aaa-bbb". The first "a" obviously matches the character class [a-z], and the second "a" matches the character class [a-z ' -]. There is absolutely no way to tell if the third "a" matches the [a-z ' -] of the old group, or the [a-z] of a new group.

This means that the following regular expression matches the same strings as your original:
 
Saloon Keeper
Posts: 8779
71
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The original regex has miss-matched parens.
reply
    Bookmark Topic Watch Topic
  • New Topic