Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Extract string between two patterns

 
amol Bhandwale
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I have following file

ISA*00* *00* *02*NFIA *02*ETRV *100202*1320*U*00401*000005297*0*P*>~GS*SM*NFIA*ETRV*20100202*1320*2661*X*004010~ST*204*6816~B2**ETRV**Amol990**PP~B2A*04~L11*807652409*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*3*E~G62*10*20100210*2*000100*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - TRACY*ZZ*0C~N3*PLANT NO. 503*450 EAST GRANT LINE ROAD~N4*TRACY*CA*95376~OID*CO748945*009360128304~S5*2*CU*44768*L*836*CA*3*E~G62*70*20100211*3*053000*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO748945*009360128304~L3*44768*G*******3*E*836*L~SE*28*6816~ST*204*6817~B2**ETRV**807165263100**PP~B2A*00~L11*807652600*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*1*E~G62*10*20100207*2*000100*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - PUEBLO*ZZ*0L~N3*PLANT NO. 514*#1 DOANE PLACE~N4*PUEBLO*CO*81001~OID*CO746858*009360125362~S5*2*CU*44768*L*836*CA*1*E~G62*70*20100208*3*063000*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO746858*009360125362~L3*44768*G*******1*E*836*L~SE*28*6817~GE*2*2661~IEA*1*000005297~


And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename

but it returns


ISA*00* *00* *02*NFIA *02*ETRV *100401*1645*U*00401*000015937*0*P*>~
GS*SM*NFIA*ETRV*20100401*1645*7852*X*004010~
ST*204*23118~
B2**ETRV**807690591**PP~
B2A*04~
L11*807690591*MB~
MS3*ETRV*B**M~
N1*CA*Mars Petcare c/o NFI Interactive~
N3*1515 Burnt Mill Rd*Former Doane Petcare~
N4*Cherry Hill*NJ*08003~
G61*IC*Joe Perez*TE*856-857-1324x2516~
S5*1*CL*43520*L***1*E~
G62*10*20100401*2*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*SF*MARS PET CARE - SAN BERNARDINO*ZZ*0E~
N3*PLANT NO. 505*2765 LEXINGTON WAY #15~
N4*SAN BERNARDINO*CA*92407~
OID*RO216442~
S5*2*CU*43520*L***1*E~
G62*70*20100405*3*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*ST*MARS PET CARE - JOPLIN*FA*0A~
N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~
N4*JOPLIN*MO*64802~
OID*RO216442~
L3*43520*G*******1*E*0*L~
SE*25*23118~
GE*1*7852~
IEA*1*000015937~



Please help
Thanks,
Amol.



 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would use probably Perl not awk -


Here the script takes it's input from stdin.
 
Sergey Babkin
author
Ranch Hand
Posts: 50
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Awk // arguments select lines, not sequences inside lines. /a/,/b/ means "start from line containing a and go up to and including the line containing b".

The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract. If this is an issue, and you can afford to read the whole file into memory at once, set

$/ = undef;

(and then return its value back after reading the file).
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sergey Babkin wrote:
The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract.


I was waiting for the "Arrrrrgh but" response that is indicative of creeping requirements!
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your regular expression is greedy. You need to use the non-greedy version.

A greedy expression tries to match the longest possible string, so the wildcard is expanded and expanded and expanded.....
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pat Farrell wrote:Your regular expression is greedy. You need to use the non-greedy version.


Could you explain which regex is greedy because I don't think the one the OP used in his awk script is greedy and my Perl one certainly isn't?
 
Kurosaki Ichigo
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
James Sabre wrote:I would use probably Perl not awk -

awk can do the job just as well, with regards to file parsing, sometimes even better and faster than Perl.
 
Kurosaki Ichigo
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
amol Bhandwale wrote:Hi All,
And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename


this is one way you can do it.

set the record separator to PP~B2A. Then set the field separator to ETRV. The last field will be what you want.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic