aspose file tools*
The moose likes Linux / UNIX and the fly likes Extract string between two patterns Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Engineering » Linux / UNIX
Bookmark "Extract string between two patterns " Watch "Extract string between two patterns " New topic
Author

Extract string between two patterns

amol Bhandwale
Greenhorn

Joined: Jul 12, 2010
Posts: 4
Hi All,

I have following file

ISA*00* *00* *02*NFIA *02*ETRV *100202*1320*U*00401*000005297*0*P*>~GS*SM*NFIA*ETRV*20100202*1320*2661*X*004010~ST*204*6816~B2**ETRV**Amol990**PP~B2A*04~L11*807652409*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*3*E~G62*10*20100210*2*000100*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - TRACY*ZZ*0C~N3*PLANT NO. 503*450 EAST GRANT LINE ROAD~N4*TRACY*CA*95376~OID*CO748945*009360128304~S5*2*CU*44768*L*836*CA*3*E~G62*70*20100211*3*053000*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO748945*009360128304~L3*44768*G*******3*E*836*L~SE*28*6816~ST*204*6817~B2**ETRV**807165263100**PP~B2A*00~L11*807652600*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*1*E~G62*10*20100207*2*000100*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - PUEBLO*ZZ*0L~N3*PLANT NO. 514*#1 DOANE PLACE~N4*PUEBLO*CO*81001~OID*CO746858*009360125362~S5*2*CU*44768*L*836*CA*1*E~G62*70*20100208*3*063000*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO746858*009360125362~L3*44768*G*******1*E*836*L~SE*28*6817~GE*2*2661~IEA*1*000005297~


And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename

but it returns


ISA*00* *00* *02*NFIA *02*ETRV *100401*1645*U*00401*000015937*0*P*>~
GS*SM*NFIA*ETRV*20100401*1645*7852*X*004010~
ST*204*23118~
B2**ETRV**807690591**PP~
B2A*04~
L11*807690591*MB~
MS3*ETRV*B**M~
N1*CA*Mars Petcare c/o NFI Interactive~
N3*1515 Burnt Mill Rd*Former Doane Petcare~
N4*Cherry Hill*NJ*08003~
G61*IC*Joe Perez*TE*856-857-1324x2516~
S5*1*CL*43520*L***1*E~
G62*10*20100401*2*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*SF*MARS PET CARE - SAN BERNARDINO*ZZ*0E~
N3*PLANT NO. 505*2765 LEXINGTON WAY #15~
N4*SAN BERNARDINO*CA*92407~
OID*RO216442~
S5*2*CU*43520*L***1*E~
G62*70*20100405*3*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*ST*MARS PET CARE - JOPLIN*FA*0A~
N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~
N4*JOPLIN*MO*64802~
OID*RO216442~
L3*43520*G*******1*E*0*L~
SE*25*23118~
GE*1*7852~
IEA*1*000015937~



Please help
Thanks,
Amol.



James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

I would use probably Perl not awk -


Here the script takes it's input from stdin.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Sergey Babkin
author
Ranch Hand

Joined: Apr 05, 2010
Posts: 50
Awk // arguments select lines, not sequences inside lines. /a/,/b/ means "start from line containing a and go up to and including the line containing b".

The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract. If this is an issue, and you can afford to read the whole file into memory at once, set

$/ = undef;

(and then return its value back after reading the file).
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Sergey Babkin wrote:
The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract.


I was waiting for the "Arrrrrgh but" response that is indicative of creeping requirements!
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4634
    
    5

Your regular expression is greedy. You need to use the non-greedy version.

A greedy expression tries to match the longest possible string, so the wildcard is expanded and expanded and expanded.....
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Pat Farrell wrote:Your regular expression is greedy. You need to use the non-greedy version.


Could you explain which regex is greedy because I don't think the one the OP used in his awk script is greedy and my Perl one certainly isn't?
Kurosaki Ichigo
Greenhorn

Joined: Dec 31, 2010
Posts: 10
James Sabre wrote:I would use probably Perl not awk -

awk can do the job just as well, with regards to file parsing, sometimes even better and faster than Perl.
Kurosaki Ichigo
Greenhorn

Joined: Dec 31, 2010
Posts: 10
amol Bhandwale wrote:Hi All,
And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename


this is one way you can do it.

set the record separator to PP~B2A. Then set the field separator to ETRV. The last field will be what you want.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extract string between two patterns
 
Similar Threads
User Defined Package Doubts
Integer.ValueOf() Method
static method not running
solve this please
solve this please