Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

deleting unwanted lines in unix

 
amit prajapati
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Hi,

My input file has data in below format

9090909090,3567,1
9876090000,4098,0
98,1

I want to delete third where first field is not 10 digits in length. How can i do this

Regards,
Amit P

 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm having trouble understanding your statement "I want to delete third where first field is not 10 digits in length." since none of your example lines have a third field when the first field is not 10 digits. Only one line does not have 10 digits in the first field but it has no 3rd field!

P.S. This type of requirement is usually easiest implemented using 'awk' .
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I have just understood your requirement! I assume that by 'third' you mean the third line in your sample and not the third field in any line. I then extrapolate and assume you simply want to remove all lines that do not have a first field that is 10 decimal digits in length. This is almost trivial using 'awk' and in form is pretty much the most basic example of 'awk' one finds. There are numerous tutorials and Google will find them.

There is one small gotcha. In 'awk' regular expressions one cannot quantify a repeat using {n} type syntax so to get your 10 decimals you will need to write the regex for a decimal character 10 times.

 
Jim Venolia
Ranch Hand
Posts: 246
2
Chrome Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I realize this is 2 months old, but in Java you could do something like:



You could also change the regex to ([0-9]{10,10}) and skip the 'length() == 10' test.
 
Campbell Ritchie
Sheriff
Pie
Posts: 48976
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim Venolia wrote: . . .. . .
I presume that is pseudocode, not Java. Surely that line should read
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim Venolia wrote:
You could also change the regex to ([0-9]{10,10}) and skip the 'length() == 10' test.


Sorry but neither this regular expression nor the one in your code meet the original requirement since the match must be exactly 10 decimal characters in only the first field and yours do not do that. Both of yours would match

"xxx12345678,34567891234,56789"

One could easily modify the Java regular expression (see the note) to match exactly the OP's requirement but why would one when it takes a single invocation of 'awk' (which is available on all Linux distributions) with the script as part of the command line? This single line would process the whole file !

Note - Rather than "([0-9]{10,10}" one would use "^([0-9]{10})," or "^(\\d{10})," and one would not need to then check the length is just 10.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic