I'm rather close to a solution, but I'm struggling with some RegEx.
I'm attempting to read a text file and parse the contents of that file. The file contains HTML. The objective is that I want to find all href attributes on an anchor tag and do processing(find/replace) on those values. I wrote up some quick RegEx to accomplish that.
The problem that I'm faced with is that I need to make sure that I exclude href attributes that contain the string ".do" in them. I have experimenting with the following regex pattern and it will find all href attributes that contain ".do", but when I attempt to use the not operator I do not get any returns.
Rob Prime wrote:Because ! is not a valid regex operator in Java except in negative lookaheads / negative lookbehinds. In your example you are looking for the literal !
Rob thanks for the response. I took a look at negative lookaheads, but I'm still struggling with the syntax and how regex is put together. Here is my new statement.
I think that I can probably figure this out if someone could validate what I think is going on. In trying to break this down into sections, I read this statement as so.
href=[\"|\'] - Look for href= followed by a " or ' character
(.+?) - Match at least one or more instance of any character.
(?!\.do) - Read ahead, but do not match strings that contain the characters .do
[?|\"|\'] - Match the ?, ", or ' character
I guess I don't understand how the read ahead special characters work.
Brian M Smith
Joined: Aug 13, 2009
Ireneusz Kordal wrote:I am not absolutely sure if I correctly get your requirements, but this is probably what you want.
Thank you for the response. This didn't seem to do what I needed it to do. Sorry if I wasn't clear in explaining what I'm trying to accomplish with regex. I'll take a look at that link you provided to see if I can make any headway.