This week's giveaway is in the Spring forum.
We're giving away four copies of Learn Spring Security (video course) and have Eugen Paraschiv on-line!
See this thread for details.
Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Need help for Shell Script to reduce a file

 
Sam Saha
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

I am new to UNIX and UNIX scripting. I have a log file which is sitting in UNIX server. The log file is big enough and contains so many 50,000 lines(approx). I want to write a UNIX shell script to remove duplicate data from the file and make a smaller file(may be 100/200 lines or something like that). I really don’t know how to write that script. It would be great if someone can help me with some sample script for that. Please let me know if you need any other information. Thank you...

 
Peter Johnson
author
Bartender
Posts: 5852
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you show us some example log lines (both duplicate lines and unique lines)?

Do the lines have timestamps in them? If so, then filtering will be much more difficult.

I'm sure someone with Lisp expertise could write a one liner (with lots of parenthesis) to do this, but if I were to do it I would have to use Python, PHP or some other higher scripting language; I wouldn't event want to think about how to do it in bash.
 
Sam Saha
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I have timestamps in the log file.

Here is an example duplicate lines from the file

 
Peter Johnson
author
Bartender
Posts: 5852
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you using Log4J to write these log entries? If so, this can be solved by setting "additivity" to false for your logger.
 
Sam Saha
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not written the application. So I do not know if the if they(who wrote the application) are using Log4J. Only I am having the log files and I have to reduce the file. This is the idea. As I am very new to UNIX and UNIX scripting I am wondering if we can write some unix shell script to remove the duplicate record from the log and make it smaller and if it is possible can you please send me some sample code for that how to do that. I would really appreciate that.
 
Peter Johnson
author
Bartender
Posts: 5852
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would first try to find the log4j configuration file and edit it to remove the duplicates.

I think that a general algorithm for printing only unique log entries would be:



The "read next line from log" is a little complicated because you have to read the entire log entry which appears to be multiple physical lines based on what you displayed. The algorithm assumes that duplicate entries will be adjacent. I would be comfortable tackling this in Java, or perhaps in Python or PHP. Someone might be able to do this in a line or two of lisp. I wouldn't even try this in bash (thought I'm sure it could be done).
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic