Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Java in General and the fly likes treating variable as regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Head First Android this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "treating variable as regex" Watch "treating variable as regex" New topic
Author

treating variable as regex

Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
Hi,

I have a requirement wherein I need to search a string if it has a word repeated thrice consecutively (for example: hello are are are you there, this should be treated as hello are you there). Is there any way by which I can treat the word "are" as a regex. What I mean is to treat the variable as regex. Or is there any other better way to implement this thing ?

TIA,
Ankit
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
is this okay?
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
Hi Harsha,

Thanks for the response. But the string there in the example demonstrated by you is a constant one. While in my case the String will be fetched from database, hence it can be anything. So, how do I choose the regex( "are are are" in your case) dynamically.
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
instead of hard-coding the string, use a variable like

String s = value retrieved from the database;

StringBuilder sb = new StringBuilder(s);

for(int i =1; i<3; i++){
sb.append(" ");
sb.append(s);
}

String regex = sb.toString();

and s is the replacement
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 20049
    
  30

You need to use a capturing group, and then check if the contents of that group reappear. Allowing only whitespace between the words:
The (\\w+) part captures one single word. The \\s+ means one or more occurrences of whitespace. The \\1 means the exact same value as the captured word.
Replace the \\s+ with something else to also allow other characters; for instance, [\\s,]+ also allows commas.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6 - OCEJPAD 6
How To Ask Questions How To Answer Questions
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
Hi Rob,

Thanks for your suggestion and that works absolutely fine when I have a string of type "Hi how are are are you". But it doesnt return anything when I try for strings like "Hi How, are, are, are, you". How do I handle such cases.


Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
Do you want to replace all the strings that occur 3 or more consecutive times or only the target string?
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
I want to replace all the strings that occurs multiple times(more than once).For example:

i i am am am am here
should be rendered as i am here

i, i, i, am here
should be rendered as i, am here

i am here
should be rendered as i am here
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
REGEX
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 20049
    
  30

If it's two or more you use a qualifier for that: "(\\w+)(\\s+\\1)+". The whitespace + repetition then is required one or more times.
If you want the comma inside the match, add that to the \\w+: "(\\w+,?). The ? makes the comma optional.

However, that will give problems with cases like "I, I, I am". The last "I" does not match the starting "I,", so replacing would give you "I, I am". Putting the comma with the whitespace (as I had already mentioned) will solve that; "I, I, I am" will become "I am", and "I, I, I, am" will become "I, am" because the last comma is not part of the match.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8661
    
  23

Ankit Chandrawat wrote:I want to replace all the strings that occurs multiple times(more than once).For example:

i i am am am am here
should be rendered as i am here...

Yes, but the problem is your rules aren't complete. Is this only for space-delimited words?
For example, what would you want to do with:
i i amamamam here
?

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
I need to search a string if it has a word repeated thrice consecutively

I want to replace all the strings that occurs multiple times(more than once)


In software development, the specs keep changing
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
space is the delimiter and about the rules, all it says is :

"  Words repeated multiple times consecutively should be considered as one"

now the definition of words can be:

am
,am
am,
,am,

the character "," is just an example of a special character. So, lets just replace the "word" with string.

which now converts it to

Strings repeated multiple times consecutively should be considered as one.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 20049
    
  30

Use "(\\S+)" as the first part. Where "\\s" means whitespace, "\\S" means anything but whitespace.
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
Thanks Rob, that really worked. Just out of curiosity, is it possible to consider the String only once if it appears say 4 times. Here we are putting a limit to the multiplicity of the String.
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
Can you tell me if this works?
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
Ya Harsha, this one worked really well. Thanks a lot.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8661
    
  23

Ankit Chandrawat wrote:Ya Harsha, this one worked really well. Thanks a lot.

A good lesson. Regexes are great, but not for everything. Sometimes the simplest is the best.

Winston
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
What is more understandable? complex regex or regular java coding with simple regex? what is easier to maintain?
Ankit Chandrawat
Ranch Hand

Joined: Jan 03, 2008
Posts: 88
I have always been a regular Java guy. Complicated regex makes me sort of uncomfortable. But the great thing is we have great solutions available in both the forms.
 
Don't get me started about those stupid light bulbs.
 
subject: treating variable as regex
 
It's not a secret anymore!