Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Strange garbage collector behaviour

 
Erik J�gi
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I am seeing quite a strange behaviour from the garbage collector in one of our applications.

The application is a standalone Java app, it receives requests over a TCP socket and responds to them. It is basically a kind of a cache before the database. There is a constant request flow of 10-100 requests per second. Average response time is below 50ms. It is running on a multiprocessor Sun machine, JDK 1.4.1_02.

The problem arises when the app has been running for a while. Depending on the allowed maximum memory size - with 256mb after about half a day, with 512mb after about a day, the garbage collector seems to be running most of the time. We turned on GC logging and from the log I can see that initially for a long period only the DefNew collector is used. The GC times are ~0.05sec then. Then after some time old generation collections start taking place. These collections take 3-4 seconds. The old gen collections then become more frequent until all that is going on, is one collection after another. About 3-4seconds for the collection, then after another 3-4 seconds the same thing again.

Can anyone please help in understanding what can cause this kind of behaviour? I am quite sure that the app is stable in a sense that it doesn't start to eat up extra memory at some point.

Any help greatly appreciated,
erik


Here are some excerpts from the GC log:

After startup:

And then in the end:

I can provide the whole log if neccessary.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, can't really tell you what's happening.

But I would advice you to try the -server switch when starting the application. This optimizes the JVM for long-running applications - as far as I know it also uses a different GCing strategy.
 
Kris Philippaerts
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The fact that GC-ing old collections takes up to 5 seconds is not so strange. That's why this kind of garbage collection is posponed as long as possible. The only reason for the GC to collect the old collection garbage, is when normal garbage collection is no longer sufficent.
In general, this is an indication that your application can't handle the load, or has a small leak in. For your kind of app, an app that handles a continous flow of requests, and doesn't deal with request bursts, this is no normal behaviour.

You should check your app again. Assigning null values to objects, in particular collections, that are no longer needed is good practice. Check your application for global collections that might keep some sort of an hidden handle to your datastructures, or something like that.

If you are absolutely sure you can not improve you code anymore, try thinking about the load. I don't know how you handle your requests. Do you start a new thread for every request? Do you use a threadpool with a queue, or just one thread with a queue? Or one thread w/o a queue? In any case, your server might be overloaded. If you send more requests than your server can handle, and you don't give the server time to catch up, the queue of incoming requests only grows and grows, until you get an OOM. Maybe you want to cluster your program and divide the requests among the different machines. It might be a bit of programming work, but I think it shouldn't take too long...

Good luck
Kris
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Kris Philippaerts:
You should check your app again. Assigning null values to objects, in particular collections, that are no longer needed is good practice.


I disagree in the general case. There are only very rare cases in which explicitely setting a reference to null makes any sense.
 
Erik J�gi
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Kris Philippaerts:
You should check your app again.


This is exactly what I ended up with myself. So I did a recheck and also downloaded an evaluation version of a profiler which showed me what I didn't believe in the beginning - we have a leak. After some monitoring with the profiler I found the reason. It wasn't so much a leak but rather a silly bug where one of the cache invalidation algorithms had a minus sign in the wrong place and basically the cache was never cleared. Problem solved now.

Best regards,
erik
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic