• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regular Expression: ignore html

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am doing a find and replace for the text inside my html. I am trying to find a regular expression which can do that for me.


Example Html string: "Welcome to furniture house. <table>How big is the dining table?</table>. Probably you would want one."

Here, I want to replace text "table" to "<strong>table</strong>", but not inside the html tag but only text "table". how can i do it with regex?


Thanks in advance.
Nitin
 
Greenhorn
Posts: 4
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
> Here, I want to replace text "table" to "<strong>table</strong>", but not inside the html tag but only text "table". how can i do it with regex?

Hmm, negative character class might work, esp. if you're asking for a specific word like table (perl-ish):
s#[^<]/?table\b#<strong>table</strong>#g

any char not a left pointy, zero or one backslash (to handle both start and end tags), "table", a word border (so's not to match "tabletennis" - I used "#" to avoid the leaning toothpick syndrome.

But the general answer is - don't try to parse html by hand, get a module/util to do it. It very, very, very quickly becomes very, very hard to cover all the possibilities by hand.
 
Rancher
Posts: 436
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Andy Bach wrote:But the general answer is - don't try to parse html by hand, get a module/util to do it. It very, very, very quickly becomes very, very hard to cover all the possibilities by hand.



And it is logically impossible with regex alone. Look at the Chomky hierarchy of grammars to see why.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic