This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Matching words except those in tags with regex

 
Bob Homes
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

To begin, I'm not even sure regex is the best tool for this. I want to do a replace all match of all occurrences of a word, except when it occurs between < and >. I found out to use the /b for word boundry matches so it only replaces full word matches, but trying to find only those outside tags is difficult. I tried to implement some sort of lookahead and look behind scheme but it failed.

So if the string is "<Hello There World>Hello There World", and I want to replace "There" with "bob"; the final string would be "<Hello There World>Hello bob World"

Is there a simple way to do this?
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To begin, I'm not even sure regex is the best tool for this.


It is probably not -- but it is doable. For one level tags -- no nesting, is should be straight forward. For two levels, it is still doable. For three or more levels, it gets even harder, and is probably not worth it.

I tried to implement some sort of lookahead and look behind scheme but it failed.


For one level, lookahead should work. Just search for the word, followed by a lookahead of anything but the close tag (zero or more), followed by (still part of lookahead) either the open tag, or end of input.

Henry
 
Bob Homes
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is what I tried, hopefully per your instructions:



Expected output is <hello world lo hel>hello world bob hel
Actual output is <hello world bob hel> hello world bob hel

Most likely it is my regex syntax. I had never heard of look ahead and look behind until earlier today.
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic