This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Linux / UNIX and the fly likes Need help for Shell Script to reduce a file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » Linux / UNIX
Bookmark "Need help for Shell Script to reduce a file " Watch "Need help for Shell Script to reduce a file " New topic
Author

Need help for Shell Script to reduce a file

Sam Saha
Ranch Hand

Joined: Jan 23, 2010
Posts: 104
Hi

I am new to UNIX and UNIX scripting. I have a log file which is sitting in UNIX server. The log file is big enough and contains so many 50,000 lines(approx). I want to write a UNIX shell script to remove duplicate data from the file and make a smaller file(may be 100/200 lines or something like that). I really don’t know how to write that script. It would be great if someone can help me with some sample script for that. Please let me know if you need any other information. Thank you...

Peter Johnson
author
Bartender

Joined: May 14, 2008
Posts: 5811
    
    7

Could you show us some example log lines (both duplicate lines and unique lines)?

Do the lines have timestamps in them? If so, then filtering will be much more difficult.

I'm sure someone with Lisp expertise could write a one liner (with lots of parenthesis) to do this, but if I were to do it I would have to use Python, PHP or some other higher scripting language; I wouldn't event want to think about how to do it in bash.


JBoss In Action
Sam Saha
Ranch Hand

Joined: Jan 23, 2010
Posts: 104
Yes, I have timestamps in the log file.

Here is an example duplicate lines from the file

Peter Johnson
author
Bartender

Joined: May 14, 2008
Posts: 5811
    
    7

Are you using Log4J to write these log entries? If so, this can be solved by setting "additivity" to false for your logger.
Sam Saha
Ranch Hand

Joined: Jan 23, 2010
Posts: 104
I have not written the application. So I do not know if the if they(who wrote the application) are using Log4J. Only I am having the log files and I have to reduce the file. This is the idea. As I am very new to UNIX and UNIX scripting I am wondering if we can write some unix shell script to remove the duplicate record from the log and make it smaller and if it is possible can you please send me some sample code for that how to do that. I would really appreciate that.
Peter Johnson
author
Bartender

Joined: May 14, 2008
Posts: 5811
    
    7

I would first try to find the log4j configuration file and edit it to remove the duplicates.

I think that a general algorithm for printing only unique log entries would be:



The "read next line from log" is a little complicated because you have to read the entire log entry which appears to be multiple physical lines based on what you displayed. The algorithm assumes that duplicate entries will be adjacent. I would be comfortable tackling this in Java, or perhaps in Python or PHP. Someone might be able to do this in a line or two of lisp. I wouldn't even try this in bash (thought I'm sure it could be done).
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Need help for Shell Script to reduce a file
 
Similar Threads
Script to Find & Remove text in multiple files.
issue in passiing variable to shell script
Starting Glassfish server on system boot
How to run .class file from Unix shell script
how to invoke a shell script from a html form