aspose file tools*
The moose likes Linux / UNIX and the fly likes Search on a string Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » Linux / UNIX
Bookmark "Search on a string" Watch "Search on a string" New topic
Author

Search on a string

SaurabhSri Sri
Ranch Hand

Joined: May 01, 2008
Posts: 43
Hi All,

I want to write a shall script which would search in a log directory for current log file (which is present as IMSA-2008-07-04.log) and then read last line of the log file. If it found "weblogic.jms.common.LostServerException" then it would send a mail to suppose abc@domain.com. I have tried but stuck as i am new for shell script . So please help me out.
This is what i am trying (Please don't angry) -



Thanks a lot.


Regards
SaurabhSri (SCJP 1.5)
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11422
    
  85

Hi SaurabhSri

It looks like you are working towards a viable solution here. You haven't mentioned where you are stuck, which makes it a hard to determine what advice to give.

The one problem I see is that your grep command is wrong. Grep is supposed to find lines in a file. So in your case, grep is assuming that $out is a filename, not a string to match on. The easiest change to this would be:


You will notice that I removed the -l switch from your grep statement. The -l switch tells grep to list the filename the string was found in. But we don't have a filename - we are looking at the $out variable, so -l makes no sense.

That should get your script working, however since we are looking at the grep command, some other suggestions for you:

The -w option specifies to match whole words only. Given that you are already looking for complex strings I doubt this is necessary. If you were searching for the word "the", you would probably want to use -w so that you don't match on "another" or "theme". But it is highly unlikely that "weblogic.jms.common.LostServerException" is a part of a larger word. So although the -w switch only adds a little bit of extra work, why not save yourself that effort? The script should run the same without it.

Since you are ignoring the output of the grep statement, why not specify the -q switch (quiet mode)? That way you will not get any standard output and can therefore remove the "> /dev/null" redirect of standard output.

Other than that, your script does appear to do what it is designed to do. There is one bit that I find questionable - you are only looking at the very last line of the log file to see if the error occurs. Personally I find that it is rare for me to be so lucky as to find the message I need on the last line (perhaps you have written your application such that this is the last line that ever gets logged, in which case this doesn't matter). Normally though, I work on the assumption that there are always 100 good lines in a log file, and just look for the error in the last 100 lines - if necessary, I might add some backoff capability (for example, track what time I saw the error, and don't report new errors until I see good log messages after the bad log messages). Anyway, this is getting a bit away from your problem.

As I said at the start, your script looks fine (except for the grep issue). It is not the way I would write it, but the old adage is correct: give a problem to 2 programmers and you will get at least 3 different ways to solve it. So I assume that you are stuck working out how to email something out.

Assuming you have sendmail set up correctly on your system, you can simply use the mail command to send out details. Something as simple as:

That should send a message to the person that will look something like:

Now, as I've said twice before, your script is reasonable. But some things I think might be better (feel free to ignore them):
  • Personally I prefer any changeable variables to all be at the top of the script. So I would start my script as:

  • <pre>err1="weblogic.jms.common.LostServerException"
    err2="java.io.EOFException"
    echo "ERRORS :- $err1 $err2"

    LOG_PREFIX="IMSA-"
    LOG_DATE_FORMAT="%Y-%m-%d"
    LOG_SUFFIX=".log"
    </pre>



  • I personally prefer to put curly braces around all variables that are to be expanded. So:


  • myDate=`date "+${LOG_DATE_FORMAT}"`

    In most cases it is not needed. So $LOG_DATE_FORMAT will work just as well as ${LOG_DATE_FORMAT}. But in some cases the shell won't be able to determine where your variable name ends, and it can be a pain to try and debug. Putting the braces in makes it explicit.

  • You do not need to echo each variable that you want to concatenate into another variable. If I were building the "myFile" variable I would use:


  • myFile="${LOG_PREFIX}${myDate}${LOG_SUFFIX}"

    This, by the way, is one place where there are (at least) three ways of building the string, and my way only has one set of quotes, but this requires the curly braces. A similar concept without the curly braces would be:

    myFile="$LOG_PREFIX""$myDate""$LOG_SUFFIX"

    So now we have 3 ways of building the string: your way, my preferred way, and an alternate way. See what I mean about ask 2 programmers how to do something and you will get at least 3 solutions? None of them are necessarily wrong, so just choose which one you like best.

  • As a general rule, it is generally considered undesirable to spawn more processes than necessary. In your version, you had:


  • myFile="`echo "IMSA-"``echo $myDate``echo ".log"`"

    This, in theory, would have spawned 3 instances of the "echo" command. In practice this is probably not a problem as you are not doing any heavy processing with the processes you are spawning. I am not entirely sure if extra processes would be spawned in this particular instance anyway, as most shells have "echo" as a built in command.

  • You are going to have 2 identical blocks to search for each of the errors. One alternative is to use extended regular expressions in grep so you can search for both at once. Something like:


  • echo $out | egrep -q "${err1}|${err2}"

    So now we are searching for both in a single pass. Since we are only looking at a single line, this is not a big deal, but if you were looking for the error in the last 100 lines of the log file, then it will save you going through those lines twice (plus it means there is only block of code looking for the error).

  • An alternative would be to add a function that could be called to look for each error. That is a larger subject, so I'll not go into that here.


  • The email command I gave had very little information in the body of the email - just the error. Assuming that the error in your log file may be preceded by some contextual information that could be useful in diagnosing the problem, you could try including that in the email. Something like:


  • egrep -B 10 "${err1}|${err2}" ${myFile} | mail -s "Error in ${myFile}" abc@domain.com

    The email will look ugly but it may be more useful for diagnostic purposes.


    So - does any of this help? Or were you having some other problem that I haven't even noticed?

    Regards, Andrew


    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    SaurabhSri Sri
    Ranch Hand

    Joined: May 01, 2008
    Posts: 43
    Hi Andrew,

    First, I would really very thankful to you for such a descriptive answer that I was looking for. I got my wrong nodes and am trying to make those correct.
    Here, I want to share you what exactly is going on (Earlier I did not have a closer look )
    Actually, the production log file is so big and also it is updating so frequently. So, probably "tail -1 myfile" won't work. Can you please give more light on this sentence written by you -

    I work on the assumption that there are always 100 good lines in a log file, and just look for the error in the last 100 lines - if necessary, I might add some backoff capability (for example, track what time I saw the error, and don't report new errors until I see good log messages after the bad log messages).


    Again, Thank you so much for your help
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11422
        
      85

    Hmm, I'll see what I can do.

    Here is some actual output in one of my old log files:

    Now I have a couple of options here. I could assume that the line prior to the exception is a useful message (normally the case). So I could get some context in my output:

    By the way - the reason I am tailing 2,375 lines in my log file from 2 days ago is because I had to go that far back to work with a real example. I recommend you play with different numbers to work out what will find your Exception.

    Given that, I can use an awk command to determine the date/time:

    My awk command says that for each line of output, if it starts with a number (/^[0-9]/) then print the first 2 parameters.

    Unfortunately it is possible that your error might occur multiple times in the last 100 lines. For example, changing to using the cat command, I can see that I have 3 occurences:

    I only really want the last one, so I will change my awk statement to keep track of the date and time in global variables, and at the end print those global variables. Since it is a global variable, it will be overwritten each time:

    Now I have a date/time stamp that I can put in my email, and more importantly use as a way of tracking whether I have already reported this problem:

    The first block just looks to see if there are any reported errors in a file. This way we can make sure we dont report the same error twice.

    The second block is the same command we have been working with - we are just storing the output of our commands in a variable.

    The third block compares the error we just found with any previously reported errors. If it is identical, then we don't need to re-report it. If it is not identical, then we send off an email. It is also possible that there are no errors found in the last 'x' lines of the log file, in which case "$ERR_FOUND" will be equal to a space (since the awk command I wrote always has a space in it). So that special case is handled - the errors reported must be different and the last error cannot be a space.

    This is the essence of my backoff capability. If I have reported the error already, then I "back off" from the problem - it doesn't matter how often this script gets called, I will not generate new emails warning of the same problem. (Can you imagine if this script ran every 30 seconds and didn't have some way of identifying that it had already reported the error: 1 email every 30 seconds until someone has fixed the problem. If a technician can't get to a terminal for 1/2 hour, that would be 60 emails. Yuck!



    Of course, I have now lost the reporting of what the error was. So let's go back to that awk statement:

    (I decided to break this over multiple lines - my single line scripts were getting too wide )

    The first block we have seen before. It just assumes that any line that starts with a number must be a date stamp followed by a time stamp.

    The next block searches for the word "Exception" anywhere in the line, and if found, it assumes that the first word must be the name of the exception. (Note that in this usage, "word" is anything that is not whitespace - the period (.) between java and net does not make them 2 separate words).

    Finally, the special case END block joins all 3 together. So my email command could be:



    All of this so far has been assuming that I could use the date/time stamp from the previous line. But if the previous line didn't have a date/time stamp, then I would have to rethink things slightly. One option might be to use the line numbers:

    This could actually make my awk statement easier:

    And so on ...

    That is a lot to take in, so feel free to ask more questions.

    (And there are plenty of experts in this forum, some of whom may be shaking their heads at my "solution" - feel free to chip in with ideas )

    Regards, Andrew
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11422
        
      85

    In thinking about my answer some more, I realized I didn't really cover the state whereby the system auto-recovers. In this case, I am really interested in seeing if the error message was the last message seen, or whether there was a message that suggests that the system is working correctly.

    For my example, I am going to use the same log file I had earlier, and use a fictitious example whereby I still have the same exception, but if I see that Hudson is polling the Source Control system then I will assume that everything is fine. (This example works for me as both the application that is causing my exception and Hudson are both logging to the same log file. However it is a fictitious example as the exception I am using really has nothing to do with Hudson). So my log file looks something like:

    Note that even though the Hudson log message is appearing as an ERROR, this is actually a good message. Just output to the wrong log level (actually going to STDERR :roll: )

    Since Hudson is polling my SCM twice every 15 minutes, there are rather a lot of log messages, so I am going to reduce the amount of data I am looking at by only looking at lines 1300 - 1400 in my log file:

    So, what I want in this case, is to only see Exceptions and Hudson messages:

    So in this case I can see that there were 3 exceptions, at relative line numbers 2, 29, & 56 (remember I chose to only look at 100 lines of my log file, so these are offset from 1300). But after the exceptions, Hudson still ran, so (in my fictitious example) I can assume that all is well.

    This has required me to read the output to determine whether auto-recovery worked. In reality it would be better if I kept track of the relative line numbers, and only reported a problem if auto-recovery wasn't noticed:

    Of course, it would be better still if we were not doing string comparisons (did we get a "Good" result). So let's use a more standard way of reporting that there were no problems:

    In this case, I am using the fact that Unix programs (and most C programs) return zero if there were no problems, and another number if there were problems (Java also follows this convention - see "System.exit() for more info). So I just check what the exit code was from awk (if [ $? ]) to determine what to do next.

    I'm probably throwing too much at you, so I'll leave it to you to ask more questions when/if you desire.

    Regards, Andrew
    SaurabhSri Sri
    Ranch Hand

    Joined: May 01, 2008
    Posts: 43
    Hi Andrew,

    Thanks again for clear me more.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Search on a string