File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Linux / UNIX and the fly likes deleting unwanted lines in unix Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Engineering » Linux / UNIX
Bookmark "deleting unwanted lines in unix" Watch "deleting unwanted lines in unix" New topic
Author

deleting unwanted lines in unix

amit prajapati
Greenhorn

Joined: Oct 18, 2011
Posts: 5

Hi,

My input file has data in below format

9090909090,3567,1
9876090000,4098,0
98,1

I want to delete third where first field is not 10 digits in length. How can i do this

Regards,
Amit P

Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1067
    
  10

I'm having trouble understanding your statement "I want to delete third where first field is not 10 digits in length." since none of your example lines have a third field when the first field is not 10 digits. Only one line does not have 10 digits in the first field but it has no 3rd field!

P.S. This type of requirement is usually easiest implemented using 'awk' .
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1067
    
  10

I think I have just understood your requirement! I assume that by 'third' you mean the third line in your sample and not the third field in any line. I then extrapolate and assume you simply want to remove all lines that do not have a first field that is 10 decimal digits in length. This is almost trivial using 'awk' and in form is pretty much the most basic example of 'awk' one finds. There are numerous tutorials and Google will find them.

There is one small gotcha. In 'awk' regular expressions one cannot quantify a repeat using {n} type syntax so to get your 10 decimals you will need to write the regex for a decimal character 10 times.

Jim Venolia
Ranch Hand

Joined: Sep 07, 2013
Posts: 154
    
    2

I realize this is 2 months old, but in Java you could do something like:



You could also change the regex to ([0-9]{10,10}) and skip the 'length() == 10' test.

It's a no-brainer. We just need to take it to the next level to turn this into a win-win situation. The best practice is to get rid of the low-hanging fruit first. Ping me with an agenda so we can go flag up on this thing
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39393
    
  28
Jim Venolia wrote: . . .. . .
I presume that is pseudocode, not Java. Surely that line should read
Richard Tookey
Ranch Hand

Joined: Aug 27, 2012
Posts: 1067
    
  10

Jim Venolia wrote:
You could also change the regex to ([0-9]{10,10}) and skip the 'length() == 10' test.


Sorry but neither this regular expression nor the one in your code meet the original requirement since the match must be exactly 10 decimal characters in only the first field and yours do not do that. Both of yours would match

"xxx12345678,34567891234,56789"

One could easily modify the Java regular expression (see the note) to match exactly the OP's requirement but why would one when it takes a single invocation of 'awk' (which is available on all Linux distributions) with the script as part of the command line? This single line would process the whole file !

Note - Rather than "([0-9]{10,10}" one would use "^([0-9]{10})," or "^(\\d{10})," and one would not need to then check the length is just 10.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: deleting unwanted lines in unix