I'm rather close to a solution, but I'm struggling with some RegEx.
I'm attempting to read a text file and parse the contents of that file. The file contains HTML. The objective is that I want to find all href attributes on an anchor tag and do processing(find/replace) on those values. I wrote up some quick RegEx to accomplish that.
href=[\"|\'](.+?)[?|\"|\']
The problem that I'm faced with is that I need to make sure that I exclude href attributes that contain the string ".do" in them. I have experimenting with the following regex pattern and it will find all href attributes that contain ".do", but when I attempt to use the not operator I do not get any returns.
href=[\"|\'](.+?)\.do[?|\"|\']
Here is the text that I'm testing with
<a href="blahdo">This is the en-us version of this spot<br /><br />aaa This would represent a content spot pull from a file.<br /><br /> <a href="/about">About US</a><br /> <a href="/about.do?id=1a">blah</a><br /><br ><a href="something?id=1111">asdfasdf</a> <b>Testing</b> <table> <tr> <td>1</td> <td>2</td> </tr> <tr> <td colspan="2"> . . 3 </td> </tr> </table> <br /><br /> <a href="custom-cable"></a>
I have been using this testing tool which will allow you guys to see the groups that the regex returns
http://www.regexplanet.com/simple/index.html
I'm interested to know why the pattern
href=[\"|\'](.+?)!(\.do)[?|\"|\'] will not return the results I'm expected where
href=[\"|\'](.+?)\.do[?|\"|\'] does.
Looking to be taught how to fish here, so please don't just give me an answer without an explanation!