| Author |
Matching words except those in tags with regex
|
Bob Homes
Greenhorn
Joined: Jun 30, 2009
Posts: 5
|
|
Hello,
To begin, I'm not even sure regex is the best tool for this. I want to do a replace all match of all occurrences of a word, except when it occurs between < and >. I found out to use the /b for word boundry matches so it only replaces full word matches, but trying to find only those outside tags is difficult. I tried to implement some sort of lookahead and look behind scheme but it failed.
So if the string is "<Hello There World>Hello There World", and I want to replace "There" with "bob"; the final string would be "<Hello There World>Hello bob World"
Is there a simple way to do this?
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16687
|
|
To begin, I'm not even sure regex is the best tool for this.
It is probably not -- but it is doable. For one level tags -- no nesting, is should be straight forward. For two levels, it is still doable. For three or more levels, it gets even harder, and is probably not worth it.
I tried to implement some sort of lookahead and look behind scheme but it failed.
For one level, lookahead should work. Just search for the word, followed by a lookahead of anything but the close tag (zero or more), followed by (still part of lookahead) either the open tag, or end of input.
Henry
|
Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
|
 |
Bob Homes
Greenhorn
Joined: Jun 30, 2009
Posts: 5
|
|
Here is what I tried, hopefully per your instructions:
Expected output is <hello world lo hel>hello world bob hel
Actual output is <hello world bob hel> hello world bob hel
Most likely it is my regex syntax. I had never heard of look ahead and look behind until earlier today.
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16687
|
|
|
|
 |
 |
|
|
subject: Matching words except those in tags with regex
|
|
|