This is a strange one I run a lot of performance tests and noticed that the standard collector appears to perform better (reproducibly) after a full gc at the start (currently I force this at the start by filling memory and catching the exception rather than System call). By better I mean seems better at collecting shorter lived objects going forward, seems to catch more. We collect GC logs and use GC log viewer, a few theories spring to mind e..g there will be a small amount of clear out and the start , the GC tuning itself ? Anyone seen similar results (Java 6) ? I initially found this with a small test app I was using to prove an issue with Weak references but have since moved it to a large long running server application, the difference is big.
"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
How exactly does "better at collecting shorter lived objects going forward, seems to catch more" translate to CPU, time or memory savings? (I actually don't understand what the sentence means...)
Can the same improvement be achieved by setting the Xmin and Xmax to the same value (so that the VM allocates all its memory at the start, which is what you achieve by allocation till the OutOfMemoryException)?
Doesn't the effect "wear out" over time (is it still noticeable after, say, 1000 GC runs, disregarding previous runs)?
i) We run Xmx = Xms , always all test cases. What is Xmin ???
ii) I've currently done up to 2 day runs (that's a lot of GC's, I'll take it to 5-7 eventually), there is no sign of it wearing out. The current application I'm profiling is a very heavily loaded server handling scripted clients.
iii) By better is ... we generate objects which are not permanent, if they can live long enough relative to the GC cycle they will be promoted and could live until a full GC (if they live just long enough relative to the GC cycle), this promotion edges us to a full GC over time from which we fully recover but its not optimal (expensive pause). We see this as a creeping used heap which flattens with the change.
So in GC viewer without my change the rate of increase of used heap is steep (this memory can be reclaimed with a full GC) with it it flat lines almost . The second case is preferable because the partial GC's are much less and not rising as opposed to the first case and a full GC if it occurred would be very expensive. We're using GC viewer for GC stats and we can see the effects in our latency stats also.
I was surprised the results transferred from my initial app to the server.
(This change appears to have no effect if CMS (default settings) is the collector and the JVM is on Solaris x86).
RIght. CMS will result in smaller pauses during full GC. In the long run, you are better off having GC run less often because GC gets more efficient when it has to collect more objects. The problem with that is that full GC results in longer pauses that can be terrible for an application that is constantly running. Running GC more often might result in smaller pauses, but it would result in more CPU being spent on GC, which would result in reduced throughput. That's why CMS is better for CPU intensive applications because it results in smaller pauses during GC
Instead of tricking the GC into running more often, you are better off using CMS. In the long run you will be "wasting" less CPU on GC
Ultimately, if short lived objects are becoming a problem, you might want to look at your design and try to reduce short lived objects. One of the strategies of improving performance of highly concurrent CPU intensive applications is to pool frequently used resources (just like you pool database Connections). You can apply the same strategy as connection pools on objects that are frequently used. For example, let's say you are frequently multiplying matrices together, you are better off creating a pool of result matrices, and use it over and over for every multiplication operation. That way you save a lot on GC. The trade off is that you increase a lot of complexity of code. That's why you should do it on objects that give the most benefit.
CMS gets debated at length ;-) ... and is another story ..
Any ideas why the full in this scenario appears to improve performance in this scenario ? One possibility we thought of was that the GC might be tuning itself in some way and perhaps by providing some appropriate initial parameters we could in fact emulate this behaviour without the need for the code.
It's hard to reccomend without actually looking at how GC performs. I would start by enabling CMS and playing around with how fast the generations grow or shrink by.
I suspect that the reason you saw a performance boost was because the hack that you did at startup caused the young generations to grow quickly, which makes it more efficient for GC to clear short lived objects (up to a limit of course). You can try observing the sizes of the different generations with your hack on. That might give you a clue about how to setup the GC at startup.
We've tried CMS many , many times with many, many settings for this particular application (others you get acceptable results given the boundaries of the problem) the results have never been comparable even without this hack.
I'll dive into the GC logs, I just thought I'd check in case anyone spotted anything obvious I'd missed.