| Author |
Regex - match group that contains word AND does not contain another word
|
Ioan Damian Sirbu
Greenhorn
Joined: Dec 22, 2008
Posts: 18
|
|
Greetings,
I have a problem with finding a regex pattern that should match any text containing a group of letters, and in the same time it does not contain another group of letters.
Iterating a file line by line, I need to extract the lines containing the word 'input', AND not containing the word 'type'.
So, 'input damian whatever' is a match, while 'input damian type whatever' is not.
Any ideas?
|
 |
Vivek Singh
Ranch Hand
Joined: Oct 27, 2009
Posts: 92
|
|
So why Regular expression is required?
As you know the exact text which you need so try this.
|
 |
Ioan Damian Sirbu
Greenhorn
Joined: Dec 22, 2008
Posts: 18
|
|
I was giving an arbitrary example.
The concrete situation is:
- I need to make a search in Eclipse in all files.
- I need to find the files that contain a custom tag that is like this <input type="calendar"> but is not like <input type="calendar" theme="simple">
I think that using Eclipse's regex matches are an option, and in the same time this regex dilemma is interesting by itself
|
 |
Ioan Damian Sirbu
Greenhorn
Joined: Dec 22, 2008
Posts: 18
|
|
|
So, any ideas?
|
 |
Jeanne Boyarsky
internet detective
Marshal
Joined: May 26, 2003
Posts: 26169
|
|
Damian Sarbu wrote:So, any ideas?
Yes. Negative lookahead/lookbehind.
It checks for the lack of presence of a regular expression after or before the one you are interested in.
|
[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
|
 |
Ioan Damian Sirbu
Greenhorn
Joined: Dec 22, 2008
Posts: 18
|
|
Thank you, I actually found a good tutorial right here http://www.javaranch.com/journal/2003/04/RegexTutorial.htm
For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))
|
 |
Ireneusz Kordal
Ranch Hand
Joined: Jun 21, 2008
Posts: 423
|
|
Damian Sarbu wrote:For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))
Maybe this pattern works in eclipse, but in java it doesn't work as you expect:
Results:
true
false
true
true
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
|
Using .* or similar non-deterministic regex sequences inside lookaheads and lookbehinds usually doesn't do what you want.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
Ioan Damian Sirbu
Greenhorn
Joined: Dec 22, 2008
Posts: 18
|
|
No, it was my fault.. I posted wrong.
By matching (.*document.*), I was capturing the whole expression. If the input was "calendar simple", the
lookahead (.*simple.*) would have nothing left to match.
The correct pattern would be (calendar)(?!.*simple.*). This would return true for "calendar" or 'calendar some words", but false for "calendar simple".
I tested this with the RegexTestHarness in the Sun tutorials.
PS: Now that I think I got how this works, I am trying to combine lookahead with lookbehind
|
 |
 |
|
|
subject: Regex - match group that contains word AND does not contain another word
|
|
|