aspose file tools*
The moose likes Java in General and the fly likes Regular expression help with unicode Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular expression help with unicode" Watch "Regular expression help with unicode" New topic
Author

Regular expression help with unicode

Sri Ponnapalli
Greenhorn

Joined: Jan 25, 2012
Posts: 5
Hi,

I'm looking for regular expression where String consists of a mixture of unicode letters and numerical digits only, with at least one of each.

The closest I got to is Java String: "^[\\p{L}\\p{N}]+$" The problem is, this still allows if the string is entirely numbers or entirely letters, so strings like "1234567" or "abcdef" still match.

Does anyone know how to fix this?

Thanks,
Sri
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

This seems to work. There's probably a cleaner way with a single regex, and, honestly, it would be clearer just to use 2 different regexes--one that says "contains at least one letter" and one that says "contains at least one digit", and then test that it matches regex1 AND matches regex2.



Breaking it apart:


The idea is that if there's at least one digit and at least one letter, and all characters are either digit or letter, then somewhere we must have either digit followed by letter or letter followed by digit. It could be at the beginning, the middle, or the end, so the "either/or" character classes have to have the zero-or-more qualifier. That is, zero characters then LN or NL, or some Ls and/or Ns followed by LN or NL, then zero or more Ls and/or Ns.

Also note that if you're using String.matches() or Matcher.matches(), you don't need the ^ and $, since matches() attempts to match against the entire input anyway.

Sri Ponnapalli
Greenhorn

Joined: Jan 25, 2012
Posts: 5
Great, this works like magic!! Thank you very much Jeff, for a very quick response!

Sri
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

You're very welcome!

Be warned, though, I expect it runs pretty slowly. If you're doing a large number of tests in a row, or testing against a very large input, it might be a problem. In that case, you'll have to either find somebody who's better at regex than I, or just break it into two separate tests (or perhpas 3) like I suggested.
Sri Ponnapalli
Greenhorn

Joined: Jan 25, 2012
Posts: 5
Got it, it is not a very high volume scenario, so this should be good. One other question. Since this solution is internationalized, I'm trying to test with unicode characters. Do you know any site where I can get good unicode character set to test this with?

Thanks,
Sri
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Sri Ponnapalli wrote:Do you know any site where I can get good unicode character set to test this with?


There's a pretty decent sampling at http://en.wikipedia.org/wiki/List_of_Unicode_characters.

Oh, and welcome to the Ranch!
Darryl Burke
Bartender

Joined: May 03, 2008
Posts: 4527
    
    5

Sri, please BeForthrightWhenCrossPostingToOtherSites
http://www.java-forums.org/new-java/54512-help-unicode-regular-expression.html
https://forums.oracle.com/forums/thread.jspa?threadID=2337867


luck, db
There are no new questions, but there may be new answers.
Sri Ponnapalli
Greenhorn

Joined: Jan 25, 2012
Posts: 5
Sorry Darryl, I wasn't aware of the cross-posting rules. Will certainly play by the rules in future.

Thanks again Jeff, your answer DID help me a lot! (I didn't hear on any of the other forums)

Sri
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Sri Ponnapalli wrote:Sorry Darryl, I wasn't aware of the cross-posting rules. Will certainly play by the rules in future.

Thanks again Jeff, your answer DID help me a lot! (I didn't hear on any of the other forums)

Sri


You're welcome! Glad I could help!

Please go back to the other forums and let them know that it's been answered (so people don't waste their time) and provide a link to this one in case anybody is interested in the solution, or wants to improve on it.
Sri Ponnapalli
Greenhorn

Joined: Jan 25, 2012
Posts: 5
I verified, and it was already done by Darryl

Sri
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regular expression help with unicode