wood burning stoves 2.0*
The moose likes Java in General and the fly likes Regex - match group that contains word AND does not contain another word Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex - match group that contains word AND does not contain another word" Watch "Regex - match group that contains word AND does not contain another word" New topic
Author

Regex - match group that contains word AND does not contain another word

Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
Greetings,

I have a problem with finding a regex pattern that should match any text containing a group of letters, and in the same time it does not contain another group of letters.
Iterating a file line by line, I need to extract the lines containing the word 'input', AND not containing the word 'type'.
So, 'input damian whatever' is a match, while 'input damian type whatever' is not.

Any ideas?
Vivek Singh
Ranch Hand

Joined: Oct 27, 2009
Posts: 92
So why Regular expression is required?
As you know the exact text which you need so try this.

Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
I was giving an arbitrary example.
The concrete situation is:
- I need to make a search in Eclipse in all files.
- I need to find the files that contain a custom tag that is like this <input type="calendar"> but is not like <input type="calendar" theme="simple">

I think that using Eclipse's regex matches are an option, and in the same time this regex dilemma is interesting by itself
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
So, any ideas?
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30116
    
150

Damian Sarbu wrote:So, any ideas?

Yes. Negative lookahead/lookbehind.

It checks for the lack of presence of a regular expression after or before the one you are interested in.

[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
Thank you, I actually found a good tutorial right here http://www.javaranch.com/journal/2003/04/RegexTutorial.htm
For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))
Ireneusz Kordal
Ranch Hand

Joined: Jun 21, 2008
Posts: 423
Damian Sarbu wrote:For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))

Maybe this pattern works in eclipse, but in java it doesn't work as you expect:

Results:
true
false
true
true
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19651
    
  18

Using .* or similar non-deterministic regex sequences inside lookaheads and lookbehinds usually doesn't do what you want.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Ioan Damian Sirbu
Greenhorn

Joined: Dec 22, 2008
Posts: 18
No, it was my fault.. I posted wrong.
By matching (.*document.*), I was capturing the whole expression. If the input was "calendar simple", the
lookahead (.*simple.*) would have nothing left to match.
The correct pattern would be (calendar)(?!.*simple.*). This would return true for "calendar" or 'calendar some words", but false for "calendar simple".

I tested this with the RegexTestHarness in the Sun tutorials.



PS: Now that I think I got how this works, I am trying to combine lookahead with lookbehind




 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regex - match group that contains word AND does not contain another word
 
Similar Threads
Regular expression syntax
java reg ex
Please help me check this regex
Hot to deal with OutOfMemoryError
java.util.regex