We have this application that loads 3rd party jars and executes them. They are not really 3rd party.. they are developed by us. But the basic idea is that we will be able to release "models" that are packaged in the jar, and the application will be able to execute them on the grid. The grid has worker processes, that get the execution requests, download the model jar, and load the model up in a classloader
We got report that the application is slowing down after about a day of heavy usage. It runs fine at 9AM, but by 5pm it becomes dead slow. We have a cron job that restarts the worker processes at night. It behaved like it was leaking memory. We started profiling the app, and saw that the memory is fine. JConsole was reporting that GC was taking about 1 minute for couple of hours of CPU usage. We did see that the PermGen memory was quite higher than normal.
It looks like the same model was getting loaded multiple times. Ideally, a particular model should get loaded only once. If the worker gets a request for a model for the first time, it should create a classloader to load the model. Next time, it should use the same model. But for some reason, the model is getting loaded multiple times. By the time is gets loaded 32 times, the application slows down. We did another test, and started a lot of parallel runs to trigger loading the model multiple times, and the system slowed to a crawl in about 2 hours. It kept executing overnight, and went OOM:Out of Perm Gen in 14 hours
So, my question is, can high PermGen usage cause GC pauses similar to high memory usage in heap space? From observation it looked like the system behaves similarly as it would behave if there were a memory leak in the heap space. However, JConsole doesn't report high GC usage.
Jayesh A Lalwani wrote:It looks like the same model was getting loaded multiple times. Ideally, a particular model should get loaded only once...
I'm probably stating the bleedin' obvious here, but don't you think that's where you should concentrate your efforts? Seems to me that whatever your "Model loader" is supposed to be doing, it ain't.
Sorry I can't be more specific, but it sounds like you've got a fairly involved algorithm (which may, in itself, be part of the problem). And from the few pages I glanced at, it seems that PermGen space is a known issue with custom ClassLoaders.
Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here