aspose file tools*
The moose likes Java in General and the fly likes Regex question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex question" Watch "Regex question" New topic
Author

Regex question

trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613
I need to do below check on the entered string.
No Special character other than colon,hyphen,period and underscore are entered.
In order to achieve this I do the below, but that experession is not effective for the hyphen(-). What I need to change to include hyphen in the ignore list.



In the above pattern, if I include -, it does not work.

Also how to achieve below,

The String should not have sequence of multiple consecutive special chars. How do I check this?


thanks
Trupti
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

trupti nigam wrote:I need to do below check on the entered string.
No Special character other than colon,hyphen,period and underscore are entered.
In order to achieve this I do the below, but that experession is not effective for the hyphen(-). What I need to change to include hyphen in the ignore list.



In the above pattern, if I include -, it does not work.

Also how to achieve below,

The String should not have sequence of multiple consecutive special chars. How do I check this?


Put the hyphen last -- anywhere else, and it will try to do a range of characters (BTW, you can also escape it with a backslash).

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

1) Since '-' is used as a meta character when within a character set it must be the first or last member of the set to have it's natural meaning.

2) You can refer to a previous group content using "\n" where n is the group number. So, assuming you only have the one capturing group, then two consecutive characters the same is detected using "(.)\1" .
trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613
Richard Tookey wrote:1) Since '-' is used as a meta character when within a character set it must be the first or last member of the set to have it's natural meaning.

2) You can refer to a previous group content using "\n" where n is the group number. So, assuming you only have the one capturing group, then two consecutive characters the same is detected using "(.)\1" .


Can you explain 2) further by writing some example.

thanks
Pradnya
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

trupti nigam wrote:
Richard Tookey wrote:1) Since '-' is used as a meta character when within a character set it must be the first or last member of the set to have it's natural meaning.

2) You can refer to a previous group content using "\n" where n is the group number. So, assuming you only have the one capturing group, then two consecutive characters the same is detected using "(.)\1" .


Can you explain 2) further by writing some example.

thanks
Pradnya

Err ... Assuming you are referring to my second point - I have given an example!
trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613
Richard Tookey wrote:
trupti nigam wrote:
Richard Tookey wrote:1) Since '-' is used as a meta character when within a character set it must be the first or last member of the set to have it's natural meaning.

2) You can refer to a previous group content using "\n" where n is the group number. So, assuming you only have the one capturing group, then two consecutive characters the same is detected using "(.)\1" .


Can you explain 2) further by writing some example.

thanks
Pradnya

Err ... Assuming you are referring to my second point - I have given an example!


So Does that mean I need to do below.
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

trupti nigam wrote:

So Does that mean I need to do below.



That will only match "a new line followed by a none-special character followed by two characters the same" which is probably not what you want (though I can't be certain since you have only provided small fragments of a specification and given little or no context). I think you need to spend some time with http://docs.oracle.com/javase/tutorial/essential/regex/ and http://www.regular-expressions.info/tutorial.html .
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7064
    
  16

trupti nigam wrote:...The String should not have sequence of multiple consecutive special chars. How do I check this?

It seems you're getting good advice, so I won't try to repeat it.

What I will say is: don't try to do this all in one regex.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

Winston Gutkowski wrote:
What I will say is: don't try to do this all in one regex.


The Devil is in the context but the OP has not provided one. Reading between the lines ( i.e. guessing what the OP wants ) this should be simple to do in one regex using an "or" so one regex is probably OK but we will see when the context is posted.
trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613
Richard Tookey wrote:
Winston Gutkowski wrote:
What I will say is: don't try to do this all in one regex.


The Devil is in the context but the OP has not provided one. Reading between the lines ( i.e. guessing what the OP wants ) this should be simple to do in one regex using an "or" so one regex is probably OK but we will see when the context is posted.


I am not sure when you say I have not provided the context. Let me try again.

String name= "alex:zang%^";

Now the regex should detect that in the above string the second portion of the string after ":" has consecutive special chars exluding [^a-zA-Z0-9:_.-] and it should reject it.
But if the String is like"alex:zang" or "AlexZang" it should pass the test.
But again if the second portion of the string i.e. zang has any single special char it will fail with my previous line of code like below.



HAve I made it clear?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

Richard Tookey wrote:
Winston Gutkowski wrote:
What I will say is: don't try to do this all in one regex.


The Devil is in the context but the OP has not provided one. Reading between the lines ( i.e. guessing what the OP wants ) this should be simple to do in one regex using an "or" so one regex is probably OK but we will see when the context is posted.


The OP definitely needs to give full details as there is lots of missing context. First, the character class in the regex is a negative search, which I assume means that if the regex succeeds, the operation will fail -- the code is looking for invalid characters. Second, the follow up request, which is looking for consecutive characters likely means consecutive valid characters -- and there is no way to match both at the same time (never mind this last point, in thinking about it some more, I guess it is possible to merge those two cases with the alternation operator).

Yes, it is possible to do both at the same time, but you need to change the logic (as you need the regex to find valid patterns). Then you need to make consecutive special characters invalid.

Henry
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

trupti nigam wrote:
I am not sure when you say I have not provided the context. Let me try again.


Not provided enough context means that you haven't explained it correctly.... for example, you last subdiscussion with Richard is moot, because what Richard interpreted (which was also what I interpreted) is not what you described next.


trupti nigam wrote:
String name= "alex:zang%^";

Now the regex should detect that in the above string the second portion of the string after ":" has consecutive special chars exluding [^a-zA-Z0-9:_.-] and it should reject it.
But if the String is like"alex:zang" or "AlexZang" it should pass the test.
But again if the second portion of the string i.e. zang has any single special char it will fail with my previous line of code like below.



HAve I made it clear?


So, you only want it to be considered as having a special character only if two or more of them exist consecutively? Then your regex should be ... "[^a-zA-Z0-9:_.-]{2,}"


Also, this regex doesn't have the concept of "the second portion of the string after ":"" -- meaning it will also trigger if two consecutive special characters occurs before the ":". The fix for this issue isn't very difficult, but I am not a fan of using something that you don't understand. I really recommend starting again, with a tutorial on regular expressions.

Henry
trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613


So, you only want it to be considered as having a special character only if two or more of them exist consecutively? Then your regex should be ... "[^a-zA-Z0-9:_.-]{2,}"

Henry


Ok Let me rephrase above.
1. No Special chars other than colon,hyphen,period and underscore are entered
2.The string does not have sequence of multiple consecutive special chars

So the above means the check should fail for below:

alexzang==> pass
alex:zang==>pass
alex%zang==> fail
alex.zang==>pass
alex..zang==>fail
alex.*zang==>fail
alex::zang==>fail
alex$%^zang==>fail
alex._zang==>fail


This is what I was told.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18120
    
  39

trupti nigam wrote:
Ok Let me rephrase above.
1. No Special chars other than colon,hyphen,period and underscore are entered
2.The string does not have sequence of multiple consecutive special chars

This is what I was told.



This description is completely different than your previous post. And it goes back to the interpretation that Richard and I thought it was. I seriously recommend that you get clarification from your instructor, because what you are saying here, and your previous post do not match.

Henry
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

Also, you are going to seriously upset people outside of the US and UK (which is most of the world) who use characters other than those in [A-Za-z] . You should most definitely seek clarification on the specification details.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7064
    
  16

trupti nigam wrote:1. No Special chars other than colon,hyphen,period and underscore are entered
2.The string does not have sequence of multiple consecutive special chars

Oddly enough I did get it right, and I repeat: don't try to do both those tests in one regex; it will be horrible.

I also strongly suggest that you make your pattern:
Pattern p = Pattern.compile("[^a-zA-Z0-9:_.-]+");
which will find the longest sequence of characters that match (your current one only matches one character), and use a Matcher to run the logic you need.

Winston
trupti nigam
Ranch Hand

Joined: Jun 21, 2001
Posts: 613
I am able to achieve both the conditions using below code.

Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

trupti nigam wrote:I am able to achieve both the conditions using below code.



Your second one would be simpler as:


Although it doesn't mean the same thing as your in isolation, since you're using these regeces together and failure occurs if EITHER you have any single char outside the first range OR you have two allowed special characters in a row, in the context of using those two regexes together, the result is the same. That is, if your original second regex matched on something other than the allowed special chars, your first one would have matched anyway and already indicated failure.

I guess it's a matter of personal opinion which approach is easier to understand overall.
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 963
    
  10

trupti nigam wrote:
Ok Let me rephrase above.
1. No Special chars other than colon,hyphen,period and underscore are entered
2.The string does not have sequence of multiple consecutive special chars

So the above means the check should fail for below:

alexzang==> pass
alex:zang==>pass
alex%zang==> fail
alex.zang==>pass
alex..zang==>fail
alex.*zang==>fail
alex::zang==>fail
alex$%^zang==>fail
alex._zang==>fail


This is what I was told.


This is straightforwards and met by using Matcher.find() with the regex "[^a-zA-Z:._-]|[:._-]{2}" . No need for two separate regex.
 
wood burning stoves
 
subject: Regex question
 
Similar Threads
Keyword search
substring
charset conversion CP1252 to UTF-16
How to check whether the input string contains numbers,comma,colon
need to remove hyphen from textbox