• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular Expression: ignore html

 
Nits Kulkarni
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am doing a find and replace for the text inside my html. I am trying to find a regular expression which can do that for me.


Example Html string: "Welcome to furniture house. <table>How big is the dining table?</table>. Probably you would want one."

Here, I want to replace text "table" to "<strong>table</strong>", but not inside the html tag but only text "table". how can i do it with regex?


Thanks in advance.
Nitin
 
Andy Bach
Greenhorn
Posts: 4
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
> Here, I want to replace text "table" to "<strong>table</strong>", but not inside the html tag but only text "table". how can i do it with regex?

Hmm, negative character class might work, esp. if you're asking for a specific word like table (perl-ish):
s#[^<]/?table\b#<strong>table</strong>#g

any char not a left pointy, zero or one backslash (to handle both start and end tags), "table", a word border (so's not to match "tabletennis" - I used "#" to avoid the leaning toothpick syndrome.

But the general answer is - don't try to parse html by hand, get a module/util to do it. It very, very, very quickly becomes very, very hard to cover all the possibilities by hand.
 
Hauke Ingmar Schmidt
Rancher
Posts: 436
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andy Bach wrote:But the general answer is - don't try to parse html by hand, get a module/util to do it. It very, very, very quickly becomes very, very hard to cover all the possibilities by hand.


And it is logically impossible with regex alone. Look at the Chomky hierarchy of grammars to see why.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic