GeeCON Prague 2014*
The moose likes Java in General and the fly likes Regex - match group that contains word AND does not contain another word Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex - match group that contains word AND does not contain another word" Watch "Regex - match group that contains word AND does not contain another word" New topic
Author

Regex - match group that contains word AND does not contain another word

Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
Greetings,

I have a problem with finding a regex pattern that should match any text containing a group of letters, and in the same time it does not contain another group of letters.
Iterating a file line by line, I need to extract the lines containing the word 'input', AND not containing the word 'type'.
So, 'input damian whatever' is a match, while 'input damian type whatever' is not.

Any ideas?
Vivek Singh
Ranch Hand

Joined: Oct 27, 2009
Posts: 92
So why Regular expression is required?
As you know the exact text which you need so try this.

Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
I was giving an arbitrary example.
The concrete situation is:
- I need to make a search in Eclipse in all files.
- I need to find the files that contain a custom tag that is like this <input type="calendar"> but is not like <input type="calendar" theme="simple">

I think that using Eclipse's regex matches are an option, and in the same time this regex dilemma is interesting by itself
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
So, any ideas?
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30586
    
154

Damian Sarbu wrote:So, any ideas?

Yes. Negative lookahead/lookbehind.

It checks for the lack of presence of a regular expression after or before the one you are interested in.

[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
Thank you, I actually found a good tutorial right here http://www.javaranch.com/journal/2003/04/RegexTutorial.htm
For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))
Ireneusz Kordal
Ranch Hand

Joined: Jun 21, 2008
Posts: 423
Damian Sarbu wrote:For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))

Maybe this pattern works in eclipse, but in java it doesn't work as you expect:

Results:
true
false
true
true
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19697
    
  20

Using .* or similar non-deterministic regex sequences inside lookaheads and lookbehinds usually doesn't do what you want.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
No, it was my fault.. I posted wrong.
By matching (.*document.*), I was capturing the whole expression. If the input was "calendar simple", the
lookahead (.*simple.*) would have nothing left to match.
The correct pattern would be (calendar)(?!.*simple.*). This would return true for "calendar" or 'calendar some words", but false for "calendar simple".

I tested this with the RegexTestHarness in the Sun tutorials.



PS: Now that I think I got how this works, I am trying to combine lookahead with lookbehind




 
GeeCON Prague 2014
 
subject: Regex - match group that contains word AND does not contain another word