I'm trying to find a way to prevent OOMs in our tomcat. there are several web apps and sometimes one of them consumes a lot of memory, thus crushing the entire process. I would like to have a manager web app (like the tomcat's manager) that will detect this and maybe undeploy \ re-deploy the problematic webapp. another solution (I don't think it's possible) is to allocate a slice of the heap to each web app separately.
changing the existing web apps is possible, but I'd rather not to.
I have seen the suggestion of "throttling" requests using some sort of custom filter. The idea being to prevent new requests from starting processing while there is already a monster memory user request running.
You might simply sleep a request thread or send a "try again later" message.
If a processor routinely goes OOM, then it's probably not very well-written, since memory doesn't just up and use itself - you're either seeing unnatural levels of recursion, or something is in serious gobble mode (unfortunately, I once spent an entire month discovering that that something was an Oracle JDBC driver in one case).
If you cannot fix it, undeploying and redeploying will probably only compound the problem, since that's likely to lock up increasingly-large chunks of PermGen space for each redeployment.
An alternative is to sandbox the offending app. Put it in a completely separate Tomcat instance where it can only damage itself, and set up a proxying mechanism so that the app continues to appear as though it's coming from the primary Tomcat. You'll need about 100MB (give or take) for the the overhead of a second Tomcat JVM, but that's the least of your worries.
Customer surveys are for companies who didn't pay proper attention to begin with.
Joined: Sep 04, 2012
I'll try to explain the use case better:
In a very VERY large product (of over 1000 developers, ~1GB of source code and 8 years of maintenance) I'm in charge of part of the platform layer, which deploys a tomcat instance in one of its processes. this tomcat receives the requests from a web server (by a re-write rule). On that tomcat, other development teams (from different sites in the world) deploys their webapps. some of these webapps crashes or eat up the process's heap space or permgen memory. WE are backward compatible, so no moving of web apps, urls etc.. We are short on memory as well, so we can’t afford the overhead of more tomcats (though I love the idea of sandbox).
I've already dealt with the perm leaks generally (by contacting the dev teams responsible of leaking web apps), but the heap usage is another issue- I can't limit the heap space usage per web app, and I can't easily identify which of the web apps consumes the heap.
another issue I have is that threads that are created in a webapp aren't destroyed when redeploying.
I'm trying to code a solution for governing and managing these nasty web apps (right now I'm thinking of instrumentation- redefining class Thread to use tomcat’s executor instead of the classic implementation).
I hope my problems are clear now, and thanks for the help
Oh joy! Typical life in the Big City. Lots of ragged old all jammed together into one big fraying mess.
What you ultimately have here is a managerial/political problem. Since the Heap is common to all webapps, each webapp is responsible for its own heap usage and there's no such thing as a per-webapp memory restriction system. The problem is compounded by the fact what J2EE webapps don't run as processes, they run by assigning threads from a shared pool and return (we hope!) those threads back to the pool at the end of each request. Meaning that there's no central process that holds resources and thus nothing to hang monitoring/control onto.
From your description, it sounds like you probably have several sets of stakeholders behind the applications, so one of the first things you need to do is get them on board, because somebody is going to have to pay to resolve this problem, and by rights, it should be the beneficiaries. You need to get a study commissioned to analyse the system and find where the offenders are, then get authority to remedy the situation.
Do NOT take "If it ain't broke, don't fix it" for an answer. A) It's patently already broken. B) In software, a lot of the "breakage" comes not from rot within the application, but because the application doesn't run in isolation and cumulative changes in the application's environment will eventually break the application from the outside in. The only deadlier maxim in IT I know of is "AYHTDI" (All You Have To Do Is...). Java is a lot more "future-proof" than most environments, but nothing lasts forever. Failing to allow for this is like welding the oil-change cap on your car in place. It means that the application's cost was not properly calculated, just like the true cost of a car is more than just what you pay the dealer (or finance company).
If you cannot spare the RAM to run a second tomcat in the same machine, see if you can find a second server somewhere. It doesn't have to be a full-powered state-of-the-art system as long as it can run Tomcat and an app or 2. Then retarget your proxying mechanism to point to this server. Since you are doing URL rewriting instead of true proxying via mod_proxy or mod_jk, there may be some application mods required to handle the new URL, though. This, however, is a tempory band-aid solution. Ultimately, you just need to get someone to cough up the resources to do a proper job of clean-up.