aspose file tools*
The moose likes Java in General and the fly likes Regex in Java: finding expression 1, but not if it's part of expression 2 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex in Java: finding expression 1, but not if it Watch "Regex in Java: finding expression 1, but not if it New topic
Author

Regex in Java: finding expression 1, but not if it's part of expression 2

Tim Quinn
Greenhorn

Joined: Sep 11, 2007
Posts: 6
Hi all,

I'm having trouble writing a regular expression in Java. I want to find all instances of the string cde, but NOT if it's part of the string abcdefg.

So if I gave my parser the text abcdefgcdex, it should only find this "cde": abcdefgcdex.

Is that possible? I'm really not sure how to go about creating such an expression.


Any help much appreciated! Thanks!
Tim
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
It's possible using negative lookahead or negative lookbehind. These are poorly explained in Java's regex API, but you can find more info here.

Basically you can create an expression where you find cde, plus two more characters, and then look back and make sure the whole thing doesn't look like abcdefg. If you can get this working, it's further complicated by the possibility that cde is at the end of the string, or just one char from the end. This will require a more complex regex that uses | to allow for two possibilities.

Is there any reason why this has to all be done in one regex? In general, I'd say that it's probably easier to use two regexes plus some java code. Search for cde, and search for abcdefg, and if you find both, check if they overlap. In all likelihood this will be easier for most people to understand when they look at your code. Lookahead and lookbehind are powerful tools, but not very widely known to many programmers.
[ September 11, 2007: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Tim Quinn
Greenhorn

Joined: Sep 11, 2007
Posts: 6
Doing it in two stages is a good idea that I hadn't really thought of -- I was getting quite far on the look-aheads and look-behind, until I discovered that it's very difficult to put variable-length regex's in the look-behinds (I kept getting the exception "Look-behind group does not have an obvious maximum length").

But my question is now this: the reason I'm looking for these words is to replace them, using String.replaceAll(). How would I get it to ignore the words I want it to ignore?

I could maybe go through it once finding the words I want to ignore, create a list of ranges I don't want it to look at, and then... what?

I can't see any way of saying "replace all these words, except if they are in this situation", unless I can create a single regex expression.

Any thoughts?
Thanks!

Tim
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[Tim]: But my question is now this: the reason I'm looking for these words is to replace them, using String.replaceAll(). How would I get it to ignore the words I want it to ignore?

Well, it's possible to replace the replaceAll() with a loop that uses appendReplaceMent() and appendTail() (in the Matcher class). See the API for appendReplacement(), or look at the source code for Matcher's replaceAll() method. You could then insert some additional logic into the loop to control whether or not the replacement occurs. However this does have some complications; it might be easier to revisit the single regex approach.

[Tim]: Doing it in two stages is a good idea that I hadn't really thought of -- I was getting quite far on the look-aheads and look-behind, until I discovered that it's very difficult to put variable-length regex's in the look-behinds (I kept getting the exception "Look-behind group does not have an obvious maximum length").

Yeah, I hate those. Especially since they sometimes occur in cases where I'm sure that the regex does have a maximum length. However I'm pretty sure this can be done with a regex of finite length that's obvious even to the regex parser. There should be no need to use any * or + quantifier to allow an unbounded length here. Can you show some examples of expressions you've tried?
Tim Quinn
Greenhorn

Joined: Sep 11, 2007
Posts: 6
Originally posted by Jim Yingst:

Well, it's possible to replace the replaceAll() with a loop that uses appendReplaceMent() and appendTail() (in the Matcher class). See the API for appendReplacement(), or look at the source code for Matcher's replaceAll() method. You could then insert some additional logic into the loop to control whether or not the replacement occurs. However this does have some complications; it might be easier to revisit the single regex approach.

I should probably have waited for your answer or worked that out myself. Instead I coded it myself in a loop -- for each match, make sure I really want to replace it, if so, replace the section by concatenating the original string's first part, the new replacement, and the original string's last part, and keep track of how the size of the string changes so I'm always replacing the correct section...

It took a bit of fiddling, but it seems to work pretty fast. It's probably not as understandable to future programmers with your method, though.

Thanks for your help!
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
For what it's worth, here's a regex for you:

cde(?:.?$|..(?<!abcdefg))
 
 
subject: Regex in Java: finding expression 1, but not if it's part of expression 2