I've been given an odd project that I actually have no idea what the best solution for it is. Please could you guys give me hand here.
Basically, we have lots of corba servers running on different machines, as well as web (tomcat) projects and some ejb's on glassfish. All of these apps are writing to 1 or more log files. We have nagios running monitoring our infrastructure and applications. The trend that we see when things go pear shaped is that if we scrutinise the log files after generally speaking things started going haywire a long time before our apps crashed. So we decided we wanted to monitor our log files. Each app uses a framework logger, so for each and every call we have an entry with time, some content and developer specified log statements, and an exit with time. These I call log set. Each log set also has it's own unique ID.
The crux of the matter is, we want to monitor the log files in real time.
Now I have a couple of ideas on how to do this. But this is where you guys come, please feel free to shoot down my ideas, cos I really don't have much of clue how to best go about this.
Using something like log4j's FileWatchDog kick off a "log set" gathering process each time the log file changes. Start reading the log file from where we left off last time, and do the necessary gathering of log sets.
My concerns about this are the log files are written too very often. It will sometimes spurt out 2000 lines a second. So when does the watch dog tell me the file has changed? Could the watch dog tell me the file changed, and before I've finished processing that change, the watch dog has told me 3 more times that the file has changed. It start becoming a bit blury for me exactly what will happen.
Every 5 minutes (or less depending on exactly what "they" meant when they said "real time") gather all the log sets. Basically same as above, just that I remove the concern above.
This however raises more concerns. What if the logs generated take longer that 5 minutes to process. what if I can't process the massive volume of log statements faster than what they're been written. Eventually this method might run forever while trying to catch up....
A bit sketchy on this one, but someone subclassing the log4j appender, and log the log set gathering as the log line is being written, in a seperate thread so as not to add overhead to the logging function.
My concerns here are that by doing the gathering in a different thread that because of the speed of the logging that we end up with too many open threads, and kill the vm.
We want to write this all out to a database, flat table so that it is fast. But again, I have concerns that this might be slower that the logging itself and this Monitoring beast falls behind and never catches up, basically becomming useless.
That's my ideas. Are there any other better ways of doing this. I don't feel good about any of these 3 ways, because I can see how things can go horribly wrong.
Lastly, To give an indication of how big some of these log files get I spose the average for log file is around 100Mb a day. But when things go bad, and there are stack traces everywhere, this can go up to and past 500Mb.
Thanks for the help, in advance.
Looking forward to hearing from you guys, and any ideas you might have.
Ps: I'm in South Africa, so please bear with any slow responses you might get back from me.... thanks.
You might consider adding an appender which writes the logs to a JMS queue. Then you could have something listening to that queue and doing whatever was necessary -- writing the logs to a database, emailing error logs to you, and so on. That would be off-line from the regular application so it wouldn't slow it down too seriously.
Joined: Jun 11, 2002
Hi Paul, and everyone else.
Thanks for the reply. I like this suggestion, but could you perhaps please clear up a concern I have with this.
A lot of the logic in processing the log file is based on the fact that the 1 log statement follows the next. We are using thread pools, and each thread pool has an ID. Each Log set begins with a BEGIN statement and completes with a COMPLETE statement. So I know that all log lines with that thread ID all belong to the same log set until I hit a log statement with the COMPLETE id.
If I use a JMS queue, is the order that the items get onto the queue, the exact same as the order that I get items off the queue.
I realise this is no longer an IO question, more a JMS question. I haven't done much work with JMS so just asking up front before I run with it.