• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Extract text lying between two patterns

 
Ranch Hand
Posts: 354
Eclipse IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have one file with content like



I need to create a new file which would look like



Basically finding all occurences of text between <?xml> and </Product>

I tried sed -n and awk range commands but they don't seem to give the desired output.


Any ideas?
 
Saloon Keeper
Posts: 27752
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's not going to produce a valid XML file. The <?xml> processing instruction can only appear once in an XML stream and only on the first line.

As for the rest of it, the main reason why your match fails is that "?" is a match control character. So instead of matching "<?xml", it's looking for [<]xml - where the square brackets indicate that the "<" is an optional character. You actually need to match "<\?xml".
 
Abhinav Srivastava
Ranch Hand
Posts: 354
Eclipse IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't want it to be an XML doc, rather just a text file having xml fragments. Actually its not about XML at all, just the text.
My problem is that sed is spitting out the entire line where it finds the match, not just the text lying between the two patterns.
 
Tim Holloway
Saloon Keeper
Posts: 27752
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Abhinav Srivastava wrote:I don't want it to be an XML doc, rather just a text file having xml fragments. Actually its not about XML at all, just the text.
My problem is that sed is spitting out the entire line where it finds the match, not just the text lying between the two patterns.



You can use parenthesis to delimit match groups, like so:

<Product>(.*)</Product>

Then you can reference the match group by its group number. It's usually something like "$1" for the first group, "$2" for the second group - if you have multiple group patterns - and so forth. The exact form varies depending of the app/library doing the matching.

AWK is probably better for this than sed. Sed can be programmed to do it, but it requires various buffer tricks. AWK would be much simpler. Something vaguely like the following:



I'm out of practice with AWK, though, so expect to do some heavy tweaking to make it work.>
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Did you got this puzzle out ?
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic