Use negative lookahead: But be warned that lookaheads are slippery; you have to make sure they don't look too far ahead. For example, if I had used (?!.*?BAD) in the regex above, it would have failed to match "apple" and "banana" because the lookahead was seeing the BAD in "carBADrot".
Joined: Jul 24, 2005
Thanks Alan for such a fast reply!
Unfortunately, as you say, the negative lookahead is "slippery" and will not work. Is there any way to prevent the regex engine looking, as you put it, "too far ahead"? In other words, is it possible for the engine only to consider just the criteria of the pattern and not the entire string being examined?
Imagine if there were "X"s instead of commas
The following pattern (just as you predict) doesn't produce the desired results :
It's hard to believe that there isn't an economical and elegant way to do this via the java.regex package. It would be great if you or someone else could prove that there is a way. Thanks again!
Joined: May 06, 2004
For the record, the regex that I actually used, "\\b(?!\\w*?BAD)\\w+\\b", works with the input in your first example. This is because "\\w" won't match a comma, so the lookahead can't see past the next delimiter. That won't work with your second example, since the delimiter is a word character, but the principle is the same: make sure the lookahead doesn't look past the next delimiter, and that the matched text is preceded and followed by a delimiter. Here's a more general approach that will work for your second example: This regex should work for any data with a single-character delimiter--just insert the real delimiter in place of each 'X' (and the string you want to exclude in place of "BAD"). In particular cases you may be able to use your knowledge of the data to express it more economically, as I did with your first example, but I'm afraid elegant is out of the question.
Ah, it really looks like Alan's posts answered your question. For some reason you replaced (?!\\w*?.*?BAD) with (?!.*?BAD), and not surprisingly it didn't work. However you don't seem to have explained what's wrong with the code Alan actually posted.
And I also like Stefan's response. Very often a problem which is hard or impossible as a single regular expression is much simpler using two or more regular expressions and a bit of Java code to tie them together. If you really must have a single regex for this, use negative lookahead as Alan suggests. But otherwise, something like what Stefan suggests will probably be easier to understand and debug. [ December 14, 2005: Message edited by: Jim Yingst ]
"I'm not back." - Bill Harding, Twister
subject: regex pattern to exclude certain substrings from matches