• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

[regex] select word of 3 letter and more between other word

 
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi

with this regex, i know if a text contain word ice and snow but not tree and ski



i search to get word between ice and snow (text must not contain tree and ski) who have more then 3 letters

is there a way to do it with regex

thanks
 
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why do you want to use a regex? Is it not allowed for "tree" or "ski" to be anywhere in the input, or just not in the part between ice and snow? What if there are multiple instances of the words ice and snow? Do you also want text between "ice" and "ice"? Can the order of "ice" and "snow" be reversed?

Please give us more information on the requirements and circumstances you're working with, and why you're trying to achieve this in the first place.
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Why do you want to use a regex? Is it not allowed for "tree" or "ski" to be anywhere in the input, or just not in the part between ice and snow? What if there are multiple instances of the words ice and snow? Do you also want text between "ice" and "ice"? Can the order of "ice" and "snow" be reversed?

Please give us more information on the requirements and circumstances you're working with, and why you're trying to achieve this in the first place.



i'm not a regex expert, but i think that could take less time to write a regex than to write a function to do the same thing

like the regex specified

order is not important
ice, snow, tree, ski can be anywhere


don't need to manage multiple instance of the words ice an snow but you be a plus...

don't want text between ice and ice... only between ice and snow or snow and ice...
 
Stephan van Hulst
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, you definitely don't want to do this with one regex. Even when you break it up it's going to look really ugly. Take a look:
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Well, you definitely don't want to do this with one regex. Even when you break it up it's going to look really ugly. Take a look:



don,t seem to work correctely because if the input = ice hello house snow tree ski
that work.... but it should not because tree and ski is available....

also

on the result, i can loop to detect every word and display them only if the word have more then 3 letter..... but is there a way to do it directly in the regex?
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:on the result, i can loop to detect every word and display them only if the word have more then 3 letter..... but is there a way to do it directly in the regex?




Ignore me if I seem to be the only one ... but having read the topic posts, I am still not clear what is being asked for here. Could you show us a bunch of examples? Input, and expected output?

Henry
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

mark smith wrote:on the result, i can loop to detect every word and display them only if the word have more then 3 letter..... but is there a way to do it directly in the regex?




Ignore me if I seem to be the only one ... but having read the topic posts, I am still not clear what is being asked for here. Could you show us a bunch of examples? Input, and expected output?

Henry



ski and tree need to be there to be bad....

snow hello house ice ski tree -> bad
snow the hello house ice ski-> return word between snow and ice who have more then 3 letters so -> hello and house is returned
snow the hello house ice -> return word between snow and ice who have more then 3 letters so -> hello and house is returned

need to work fine except when only one of the two bad word are there.....


to get only word of 3 letter and more i tried without success:

 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:
ski and tree need to be there to be bad....

snow hello house ice ski tree -> bad
snow the hello house ice ski-> return word between snow and ice who have more then 3 letters so -> hello and house is returned
snow the hello house ice -> return word between snow and ice who have more then 3 letters so -> hello and house is returned

need to work fine except when only one of the two bad word are there.....



Oh, I see now.

mark smith wrote:
don,t seem to work correctely because if the input = ice hello house snow tree ski
that work.... but it should not because tree and ski is available....



You will need to modify the bad regex to match only when both bad words are present. Currently, it is one or the other.

mark smith wrote:
on the result, i can loop to detect every word and display them only if the word have more then 3 letter..... but is there a way to do it directly in the regex?



Not really. Regexes is not really good at returning an unknown number of matches with a single match. You have to rewrite it to loop yourself -- and probably to check the edge tokens yourself too.

mark smith wrote:
i'm not a regex expert, but i think that could take less time to write a regex than to write a function to do the same thing



I guess you are starting to realize that this may not be true.

Henry
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:
to get only word of 3 letter and more i tried without success:




"^{0,3}\\w" -- means zero to three of the beginning of input marker followed by a single word character. Of course, this makes no sense, since there is no way that the beginning of input marker can appear after an edge token, especially if you want more than one of it.

Henry
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

mark smith wrote:
on the result, i can loop to detect every word and display them only if the word have more then 3 letter..... but is there a way to do it directly in the regex?



Not really. Regexes is not really good at returning an unknown number of matches with a single match. You have to rewrite it to loop yourself -- and probably to check the edge tokens yourself too.



I guess another way to do this is... use regex to capture the phrase between the two edges, then use regex on the phrase to get all words greater than three letters.

Henry
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

I guess another way to do this is... use regex to capture the phrase between the two edges, then use regex on the phrase to get all words greater than three letters.

Henry



i thought i could replace : (.*) in the code below



by (?=\\w{4,}\\b)

(.*) is the regex who capture the sentence between the two edge, no?
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:
(.*) is the regex who capture the sentence between the two edge, no?



Yes. it captures the phrase between the two edges. With it, you can use another regex to get the words that are greater than four letters. You will not be able to capture the words in the same pass, because you have an indeterminate number of words.

Henry
 
Stephan van Hulst
Saloon Keeper
Posts: 15510
363
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why do you need to do all of this with as few (horrible, long, unreadable) regular expressions as possible?

Just write readable code. In the example I have given (which apparently doesn't exactly work correctly, but I'm sure you can change that), you can simply perform some operations on the capture returned by the 'good' pattern, as Henry already implied. This would be a *much* more preferable solution to doing it in a horrible, long, unreadable regex, even *if* you could do it with one regex in the first place.
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

mark smith wrote:
(.*) is the regex who capture the sentence between the two edge, no?



Yes. it captures the phrase between the two edges. With it, you can use another regex to get the words that are greater than four letters. You will not be able to capture the words in the same pass, because you have an indeterminate number of words.

Henry



i added another pattern: splittedWord

after i tried to do a matcher on the value returned by the good matcher...




splitedMatcher.matches() return alway false...
i don't understdand why

my input was: ice the hello house test snow
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
matcher.group(1) is " the hello house test " (including leading and trailing spaces). That certainly does not match your splittedWords pattern. It could find a few results ("hello", "house", "test"), but that's not what you're doing right now.

Edit: I misread the splittedWords pattern. It wouldn't cause "hello", "house" and "test" to be found, but instead empty strings just before those words. After all, you're using a positive lookahead.
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:Edit: I misread the splittedWords pattern. It wouldn't cause "hello", "house" and "test" to be found, but instead empty strings just before those words. After all, you're using a positive lookahead.




Yeah. The original post had a regex that contains both positive and negative look-aheads. I am surprised that the OP doesn't know (or forgot) that look-aheads (and look-behinds) are non-capturing.

Henry
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:matcher.group(1) is " the hello house test " (including leading and trailing spaces). That certainly does not match your splittedWords pattern. It could find a few results ("hello", "house", "test"), but that's not what you're doing right now.

Edit: I misread the splittedWords pattern. It wouldn't cause "hello", "house" and "test" to be found, but instead empty strings just before those words. After all, you're using a positive lookahead.



this code should split the sentence and get all word, no?


i'm lost

i tried a couple of solution on http://www.regexplanet.com/ but that alway fails.
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:this code should split the sentence and get all word, no?



no

mark smith wrote:i'm lost

i tried a couple of solution on http://www.regexplanet.com/ but that alway fails.



It may be a good idea to start with a good tutorial on regular expressions. Regex is not something that can be learned by trail and error.

Henry
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

mark smith wrote:this code should split the sentence and get all word, no?



no

mark smith wrote:i'm lost

i tried a couple of solution on http://www.regexplanet.com/ but that alway fails.



It may be a good idea to start with a good tutorial on regular expressions. Regex is not something that can be learned by trail and error.

Henry



i used:


that work, surely there is a better way to do it

will buy a book
 
Rob Spoor
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That regex looks pretty good to me. It's exactly what you want: words that contain 3 or more letters.
 
mark smith
Ranch Hand
Posts: 264
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:That regex looks pretty good to me. It's exactly what you want: words that contain 3 or more letters.



why when i check with : splitedMatcher.matches()

that return false?
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:

Rob Spoor wrote:That regex looks pretty good to me. It's exactly what you want: words that contain 3 or more letters.



why when i check with : splitedMatcher.matches()

that return false?



Because matches() and find() methods are not the same thing. The matches() method is used to determine if the regex matches the whole input string. The find() method searches for the next substring in the input that matches, and returns it as group zero.

Henry
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

mark smith wrote:that work, surely there is a better way to do it



There is always room for improvement. For example, since the find() goes from left to right, and the regex is greedy, you really don't need the two word boundary specifiers.

Henry
 
Here. Have a potato. I grew it in my armpit. And from my other armpit, this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic