Hi All,
this is the regex expression I am using: (<div)\s+id="article_body">(.*?((\1((.*?(\5|\8).*?|.*?)|.*?)\9)|(\1[^>]*?/>))){1,}(</div>)
to extract whole complete <div id='article_body'> tag. Note that this tag can have other <div> tags as well as other tags. there can be other <div> tags before or after this tag. My expression is not accurate.
Following are the contents:
// Snip
Please help me to get right regex expression.
Thanks in advance.
regards,
kk.
Please don't post huge amounts of code. Try to give an small example that explains your problem. Also tell what you think
that should happen and what actually happened.
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
Hi all,
What I want is the whole <div id='article_body'> tag from the contents of the file attached. the regex expression I provided considers the nested nature of this tag - this tag can be nested within other <div> tags and other <div> tag can be nested into this.
My expression is giving me wrong results - it either extracts contents starting from article_body to first </div> tag or last </div> tag. both the cases are invalid. extracted contents should end up to the </div> tag meant for <div id='article_body'>.
I have numbered groups in the regex expression from left to right (don't know the right order).
cases may be-
1) there would not be any tags in article_body tag.
2) nested tags - like <div id='parent'><div id='article-body'><div>sss</div>ssdd<div><div />sfdfd</div></div></div>
for nested nature I have used backreferences to group.
other alternative solutions like best open source html parser are also welcome - suggest me a html parser.