Meaningless Drivel is fun!*
The moose likes Java in General and the fly likes Regex Help Needed Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex Help Needed" Watch "Regex Help Needed" New topic
Author

Regex Help Needed

Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Guys,

I have the following conditions for which I would need a Regex pattern:

Allowable characters are [a-z.A-Z,0-9,!,",§,%,&,/,(,),=,?,*,+,-,_] containing atleast one Big and Small character case, atleast one number and the number should not be the first character.

I have a pattern built for the above case but as a combination of three different pattern. I would like to have them as one.


SCJP 1.4, SCWCD 1.4 - Hints for you, Certified Scrum Master
Did a rm -R / to find out that I lost my entire Linux installation!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

This sounds like something I wouldn't do with regex but with a simple loop over the string's characters. Keep flags that indicate if the requirements have been met:
As for checking if a character is valid, I tend to use String.indexOf for that:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18508
    
  40


The Allowable characters and the number should not be the first character parts can easily be done via the regex. The atleast one Big and Small character case, atleast one number part also be done -- by prepending a bunch of positive look aheads to the regex. One look ahead for each of the three items.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

I'd still prefer a loop. Not only will it most likely be faster (untested) but it will probably also be easier to read. Instead of having a long regex that will probably be a bit complex with the lookaheads you see just what is done:
* each character is checked to be valid
* each character is checked against the possible character classes (using a flag per class)
* at the end you validate that enough of these flags are true
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Henry Wong wrote:
The Allowable characters and the number should not be the first character parts can easily be done via the regex. The atleast one Big and Small character case, atleast one number part also be done -- by prepending a bunch of positive look aheads to the regex. One look ahead for each of the three items.

Henry


Look ahead???
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Oh, you meant the pattern...
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Rob Prime wrote:I'd still prefer a loop. Not only will it most likely be faster (untested) but it will probably also be easier to read. Instead of having a long regex that will probably be a bit complex with the lookaheads you see just what is done:
* each character is checked to be valid
* each character is checked against the possible character classes (using a flag per class)
* at the end you validate that enough of these flags are true


The desired regex can be constructed so as to be readable and maintainable -



The pattern can be compiled just once and is thread safe.

It is then easy to use -


As to whether or not it is faster than a load of loops - probably not but it is compact and readable and maintainable.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

I managed to get each of the conditions done as a separate pattern so that I can be efficient and faster with the matching. The only thing that I miss is the check for not allowing the number as the first character. How can I do this using Regex??
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Jothi Shankar Kumar wrote:I managed to get each of the conditions done as a separate pattern so that I can be efficient and faster with the matching. The only thing that I miss is the check for not allowing the number as the first character. How can I do this using Regex??


Obviously my other response is invisible.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18508
    
  40

James Sabre wrote:
Jothi Shankar Kumar wrote:I managed to get each of the conditions done as a separate pattern so that I can be efficient and faster with the matching. The only thing that I miss is the check for not allowing the number as the first character. How can I do this using Regex??


Obviously my other response is invisible.


As a side note, I am not a fan of providing regex solutions, but won't remove them because it is unlikely that a regex is a home work (or interview) question... but IMHO, people do need to get to the regex pattern themselves. Otherwise, they end up with a pattern that they do not understand.

Henry
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Henry Wong wrote:
James Sabre wrote:
Jothi Shankar Kumar wrote:I managed to get each of the conditions done as a separate pattern so that I can be efficient and faster with the matching. The only thing that I miss is the check for not allowing the number as the first character. How can I do this using Regex??


Obviously my other response is invisible.


As a side note, I am not a fan of providing regex solutions, but won't remove them because it is unlikely that a regex is a home work (or interview) question... but IMHO, people do need to get to the regex pattern themselves. Otherwise, they end up with a pattern that they do not understand.

Henry


I agree and don't like just posting code but the general prejudice against regex exhibited in these forums gives the impression that regex are an invention of the devil and should be avoided at all cost. Regex are a tool and any tool has tasks it is good for and tasks it is bad for. I think this task is ideal for a regex. Okay it is not a trivial regex but, by posting code in the form I have, I hope I have demonstated that regex can be written in such a manner that they are maintainable and readable. If I hadn't posted code the non-regex solution might have prevailed (it might still prevail) but at least I have tried to counter the prejudice.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41070
    
  43
Henry Wong wrote:IMHO, people do need to get to the regex pattern themselves. Otherwise, they end up with a pattern that they do not understand.

+1. Unless Jothi takes away a solid understanding of the concept of lookahead, this discussion will have been like handing out fish instead of teaching to fish.


Ping & DNS - my free Android networking tools app
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Ulf Dittmer wrote:
Henry Wong wrote:IMHO, people do need to get to the regex pattern themselves. Otherwise, they end up with a pattern that they do not understand.

+1. Unless Jothi takes away a solid understanding of the concept of lookahead, this discussion will have been like handing out fish instead of teaching to fish.


But he also needs to understand that there are ways to fish that don't involve just throwing dynamite into the fish pond.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

James Sabre wrote:
Rob Prime wrote:I'd still prefer a loop. Not only will it most likely be faster (untested) but it will probably also be easier to read. Instead of having a long regex that will probably be a bit complex with the lookaheads you see just what is done:
* each character is checked to be valid
* each character is checked against the possible character classes (using a flag per class)
* at the end you validate that enough of these flags are true


The desired regex can be constructed so as to be readable and maintainable -



The pattern can be compiled just once and is thread safe.

It is then easy to use -


As to whether or not it is faster than a load of loops - probably not but it is compact and readable and maintainable.

If you put it like that then there is no reason to choose one over the other, unless performance is a real issue. (Remember, don't optimize prematurely ).

That said, there is one flaw in your regex: your code checks for the presence of an allowed character, but it does not check for the absence of non-allowed characters. That makes the regex a bit more complex; my simple loop still is easy
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Rob Prime wrote:
If you put it like that then there is no reason to choose one over the other, unless performance is a real issue. (Remember, don't optimize prematurely ).

That said, there is one flaw in your regex: your code checks for the presence of an allowed character, but it does not check for the absence of non-allowed characters. That makes the regex a bit more complex; my simple loop still is easy



Yep - I rushed. Simple to fix



Blame me, not regex.

I can see other ways to improve on this but ...
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18508
    
  40


Also... the contains special requirement was not asked for -- no such requirement. The regex can get a bit shorter.

Henry
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Here is what I do...



The only thing that fails here is the check for the allowed characters. At present with the code above, it goes inside the if condition only if there is atleast one occurrence of the allowed characters. But I only want to check if the special characters are one of the allowable characters. In essence, the existence of the allowed characters is not mandatory.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

James Sabre wrote:Yep - I rushed. Simple to fix



Blame me, not regex.

No offense, but this just proves my point: with a loop, even rushed code would probably be correct. It's just so much easier to make mistakes with regexes.

I once again quote Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Rob Prime wrote:
No offense, but this just proves my point: with a loop, even rushed code would probably be correct. It's just so much easier to make mistakes with regexes.

I once again quote Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.


A predictable response. The error in the regex was due to my misunderstanding of the requirement, not in the regex implementation of what I mistakenly understood to be the requirement. I would have made exactly the same mistake using loops.

The "Jamie Zawinski" quote applies equally well to any tool that is used without the knowledge of when it is right to use it. I see many threads in forums where people are try to use regex as the primary tool for parsing HTML or CSV files. In neither case is regex the right tool and I always recommend against it. I will recommend against using any tool for the sake of using the tool.

The OP's problem cries out for regex and is the sort of problem that regex are designed for. If I had just written the regex in one line e.g.

then I could have been criticized since as a whole it is difficult to read. By presenting the regex in fragments it is very very readable.

I shall continue to be a regex evangelist but only when it is the right tool for the job. I shall continue preaching the use of the 'fragment' approach when building a regex. Even though obviously outnumbered, in these forums I shall continue expressing my regex views so as to add some balance.


Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18508
    
  40

On one side...

Rob Prime wrote:
No offense, but this just proves my point: with a loop, even rushed code would probably be correct. It's just so much easier to make mistakes with regexes.


Which I totally agree with. There are just too many newbies to regex who just does it by trial and error, and winds up with a mess, that barely works -- and not even sure why.

On the other side...

James Sabre wrote:
A predictable response. The error in the regex was due to my misunderstanding of the requirement, not in the regex implementation of what I mistakenly understood to be the requirement. I would have made exactly the same mistake using loops.


Which I also agree with. I have created some (actually many) regex based solutions that was shorter, easier to code, and easier to understand than even using loops.


And after reading the two responses... I am in more agreement with... James. Why?

James Sabre wrote:
I shall continue to be a regex evangelist but only when it is the right tool for the job. I shall continue preaching the use of the 'fragment' approach when building a regex. Even though obviously outnumbered, in these forums I shall continue expressing my regex views so as to add some balance.


Totally agree. Regex is a tool. And a really cool one at that. It should not be abandoned because it gets abused once in a while -- okay, maybe it gets abused much more than I would like to see happen.


This is, of course, completely my opinion. It is not like I am wearing a zebra skin uniform, with referee written on the back...


Henry
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18508
    
  40


And BTW...

James Sabre wrote:
I agree and don't like just posting code but the general prejudice against regex exhibited in these forums gives the impression that regex are an invention of the devil and should be avoided at all cost....


I think the Javaranch is more balanced than most, in this regard. And there are many here who are very comfortable with regexes. Personally, I like to push the limits of a regex, even when it is not the ideal solution, for small throwaway file parsing programs. Why? The projects are one shot deals. And the best way to get comfortable with using it, is to push its limits.

Henry
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

James Sabre wrote:The OP's problem cries out for regex and is the sort of problem that regex are designed for.

I disagree. It can be done with regexes but it's not necessary. Like I said, for me it cried out loop.

I shall continue to be a regex evangelist but only when it is the right tool for the job.

Oh, I couldn't agree more. I use regexes a lot of the time as well. I just wouldn't in this case.

I shall continue preaching the use of the 'fragment' approach when building a regex.

And I applaud you for it. I only wish I sometimes have done that; now I just have long blocks of comment explaining it...
I have just one advice for you: make all the regex strings final. That way the concatenation will be done by the compiler, not by the JVM at runtime.

Even though obviously outnumbered, in these forums I shall continue expressing my regex views so as to add some balance.

I wouldn't say you're outnumbered. I guess it's more of a "teach it slowly" practice that prevents trial-and-error regexes like Henry mentioned.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Thanks for all your suggestions and inputs. Just came back from a weekend vacation. Still trying to figure out as to ho to make the Pattern.matches(REGEX_ALLOWED_CHARACTERS, args[0] check optional in my if condition.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Any help?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

PatienceIsAVirtue.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37940
    
  22
Too difficult for "beginning". Moving thread.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Modified the program as below:


Even then, not able to skip the optional test for the allowed characters. Suggestions appreciated.
Joe Harry
Ranch Hand

Joined: Sep 26, 2006
Posts: 9345
    
    2

Do I add escape sequences to the REGEX_PASSWORD_ALLOWED_CHARACTERS pattern in my code above??
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regex Help Needed
 
Similar Threads
Url Matching Algorithm Used By Tomcat for web.xml
Regex pattern for validating password
grabing letter
regular expression
removing tags from a string....