Yesterday we had an amusing problem where one database server failed, and the application servers using that database began logging the failure up to the point that they filled their own drives and fell over too.
Preferably the logs would be written to a dedicated (local) partition but that isn't going to happen any time soon. I was however thinking that since the logs are text and compress well that we could create a ZSF partition for them.
If ALL of your application servers fill their drives, it sounds like either they're underbudgeted for local storage or maybe you need smarter logging and some way to downcycle the apps until they can get access to the database again. In other words, go into degraded mode and refuse transactions until a probe of the database indicates its running reliably again.
It's common these days for Linux systems to maintain a set of several generations of logs and to compress all but the active one. I imagine that the logrotate facility is taking care of this. That saves on long-term log space, but for short-term situations, probably the best thing to do is simply not log each and every error at a fine-grained level.
Another possibility is to send the messages to a remote logging host attached to a SAN, but in this case, you'd probably have had a network implosion as well.
Customer surveys are for companies who didn't pay proper attention to begin with.
1) the servers don't maintain state locally except for logs, so the drives are small 2) the log settings are defined 'global' and apply to all app server instances on all servers. It is something I would like to address. 3) the default logging level is more verbose than I like
Our logs in other places get rotated and zipped, but this is a nightly rotation so it doesn't help that we had several log files in the order of 5G+ on each server. Rotation and zipping wasn't going to help at this stage.