File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes regex for nameFields: first & last names tested separately Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "regex for nameFields: first & last names tested separately" Watch "regex for nameFields: first & last names tested separately" New topic

regex for nameFields: first & last names tested separately

Unnsse Khan
Ranch Hand

Joined: Nov 12, 2001
Posts: 511
Hello again,

I am looking for a good regex pattern for first and last names...

Have two TextFields (one for a person's first name and one for the person's last name).

The rules I want to specify are:

1. First letter is always capital and all subsequent letters are lowercase.

2. No symbols (e.g. !#@$%^&*()_+=) except for a hypen (-) are allowed. Only want alpha numbers (A-z).

3. No numbers are allowed.

4. I want it to only test for potential first and last names but with no spaces...

Found this on the Internet:


But Eclipse doesn't like the escape sequences...

When I tried:

Eclipse's problems view spat out:

Many, many thanks!
[ March 20, 2007: Message edited by: Unnsse Khan ]
Jeanne Boyarsky
author & internet detective

Joined: May 26, 2003
Posts: 33129

In Java, the backslash is a special character. So you need to replace each \ with \\ to make Java happy. The actual regular expression will then contain a single backslash; it will just be visually represented as two.

This gives you:

[OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
A good thing to keep in mind is that regex strings are always "compiled" into Pattern objects at runtime. When you get a compile-time error about escape sequences, it isn't talking about your regex syntax, it's talking about your String literal syntax. As Jeanne said, it's telling you that you failed to escape your backslashes.

In this case, however, you could just as easily fix the error by removing the backslashes. Of the four escaped characters in that regex, only the hyphen has a special meaning inside a character class, but not if it's the first or last character listed. The period loses its usual special meaning, and the apostrophe and the comma never had them. And we should never pass up a chance to reduce the number of backslashes in our regexes. ^_^

It's not really a big deal in a situation like this, where the regex is applied in response to user input and the target strings are relatively short, but that regex is much more complicated and inefficient than it needs to be. The intention is obviously to make sure every punctuation character is followed by at least one letter; here's a clearer, quicker way to express that: I have a much bigger problem with the notion of applying arbitrary, simplistic validation rules to name-entry fields. No computer has yet told me that I'm misspelling my own name, but if one ever does, the owner of that computer will not be doing any business with me if I have any choice in the matter. ^_^
Unnsse Khan
Ranch Hand

Joined: Nov 12, 2001
Posts: 511
Jeanne & Alan,

Thanks for all of the great advice...

I just tried this regex in my application, and I must say that I do have a complaint.

If you type in:

John D.

and hit enter...

the validation breaks!

I want the ability to have dashes (-) and dots (.) inside my regex fields...

Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
The regex is very ASCII-ish. Since Java supports Unicode, it would be just as easy to use "[\\p{L}]" instead of "[a-zA-Z]", and thus not annoy people with umlauts in their names. As Alan said, the validation had better not tell me that I'm misspelling my name if I'm not.
Unnsse Khan
Ranch Hand

Joined: Nov 12, 2001
Posts: 511

Thanks for the advice...

My regex is not this:

String nameRegex = "^[\\p{L}]++(?:[',.-][\\p{L}]++)*+$";

The problem with this regex is that it won't let things pass validation in situations representing:

Joe A.

or even

Joe A

What I want is the ability to put a space (for a middle name) and a dash (-) for names such as:


Many, many thanks!

Sincerely yours,
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Okay, so add a space to the character class, and move the period to the end.
Purushoth Thambu
Ranch Hand

Joined: May 24, 2003
Posts: 425
Isn't it your requirement that first letter should be in upper case? I was (am still) confounded by how the regular expression "^[\\p{L}]++" will ensure that first letter is in uppercase. If your 4 rules still holds good I believe the below one will be the right one.
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
I was just fine-tuning the earlier suggestions, but you're right, I left out the initial capital requirement. Also, I put the space in the wrong place; the way I wrote it, it means "an apostrophe or anything in the range of comma through space". That wouldn't even compile, so let's try it again:
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
Those initial requirements would also disallow names like "O'Meara", "McNamara" and "de Queiroz". I would think they need to be loosened up anyway. Overly restrictive validation rules create more problems than they solve, in my opinion.

"I'm not back." - Bill Harding, Twister
Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

And there are people who insist on not having capital letters at the beginning of their names. And there are people who have first names like "Billy Joe". And... the list goes on. I agree with Alan Moore and the others who said that the regex is solving a problem that does not exist.
I agree. Here's the link:
subject: regex for nameFields: first & last names tested separately
It's not a secret anymore!