This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes Regular expression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular expression" Watch "Regular expression" New topic
Author

Regular expression

Carlos Bonzilla
Greenhorn

Joined: May 03, 2011
Posts: 17
I have a field where the users are allowed to enter comments on my web-page. The characters allowed to enter are . For instance:
1.I am allowed is ok
2.I am not allowed # is not ok
3.ÅÄÖåäö-,.:'éüáèç%()@ is ok.

Any suggestion for the regular expression that will solve this ?
Best regards
/Carlos
Darryl Burke
Bartender

Joined: May 03, 2008
Posts: 4523
    
    5

Here are a couple of learning resources for regex:
http://www.regular-expressions.info/
http://download.oracle.com/javase/tutorial/essential/regex/index.html

And of course there's the java.util.regex.Pattern API.

Show your best efforts, in the form of an SSCCE and someone will help you do the fine-tuning if needed.


luck, db
There are no new questions, but there may be new answers.
Ryan Beckett
Ranch Hand

Joined: Feb 22, 2009
Posts: 192


Since I've just given you the answer, at least let me explain it, so you can learn how I did it.

Start off by reviewing the literature in the Regex API linked above. It's a good reference, but if you've never done regular expressions, check out the tutorials first.

(1)

This is the range of the specific Latin unicode characters you expect to be in the input. Simple enough. See the Latin Unicode chart for details.

(2)

This means match (or allow) any word character (0-9, A-Z, or a-z)

(3)

Allow whitespace characters.

(4)

Allow any punctuation character.

(5)

Allow all of previously declared characters "and not" this one. Whatever punctuation you don't want needs to be included inside the brackets.

(6)

This is a greedy quantifier. It says to allow "one or more of all of these characters" in the string. Note that the regular expression must be enclosed in brackets when applying the quantifier.

Also, Note that all of these specifiers are escaped because they're within strings. Hope that helps. Good luck.
Carlos Bonzilla
Greenhorn

Joined: May 03, 2011
Posts: 17
Ryan Beckett wrote:

See Latin Unicode.


Thanks for your help Ryan. I think some more characters needs to be excluded. For instance, the string Hey how are u$[*? passed the test although it shouldn't.

Best regards
/Carlos
Ryan Beckett
Ranch Hand

Joined: Feb 22, 2009
Posts: 192
Try this.

Carlos Bonzilla
Greenhorn

Joined: May 03, 2011
Posts: 17
Ryan Beckett wrote:Try this.



Thanks for your explanation Ryan. I am very new to regular expressions so your links will be read for sure

Best regards
/Carlos
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

I'd probably use \\p{L} and \\d instead of \u00C0-\u00FF and \\w; \\p{L} includes a-z and A-Z, so \\w can be replaced by \\d. \\p{L} also includes all Unicode letters, including some of the more exotic ones (Spanish, Scandinavian, etc).


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
 
Consider Paul's rocket mass heater.
 
subject: Regular expression
 
Similar Threads
regex for nameFields: first & last names tested separately
regular expression problems
Regular Expression
Regular Expression issue
Find String Within a String