jQuery in Action, 2nd edition*
The moose likes Performance and the fly likes Unexpected Standard GC performance boost Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Unexpected Standard GC performance boost" Watch "Unexpected Standard GC performance boost" New topic
Author

Unexpected Standard GC performance boost

Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

Hi,

This is a strange one I run a lot of performance tests and noticed that the standard collector appears to perform better (reproducibly) after a full gc at the start (currently I force this at the start by filling memory and catching the exception rather than System call). By better I mean seems better at collecting shorter lived objects going forward, seems to catch more. We collect GC logs and use GC log viewer, a few theories spring to mind e..g there will be a small amount of clear out and the start , the GC tuning itself ? Anyone seen similar results (Java 6) ? I initially found this with a small test app I was using to prove an issue with Weak references but have since moved it to a large long running server application, the difference is big.

Chris


"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

How exactly does "better at collecting shorter lived objects going forward, seems to catch more" translate to CPU, time or memory savings? (I actually don't understand what the sentence means...)

Can the same improvement be achieved by setting the Xmin and Xmax to the same value (so that the VM allocates all its memory at the start, which is what you achieve by allocation till the OutOfMemoryException)?

Doesn't the effect "wear out" over time (is it still noticeable after, say, 1000 GC runs, disregarding previous runs)?
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

Hi,

i) We run Xmx = Xms , always all test cases. What is Xmin ???

ii) I've currently done up to 2 day runs (that's a lot of GC's, I'll take it to 5-7 eventually), there is no sign of it wearing out. The current application I'm profiling is a very heavily loaded server handling scripted clients.

iii) By better is ... we generate objects which are not permanent, if they can live long enough relative to the GC cycle they will be promoted and could live until a full GC (if they live just long enough relative to the GC cycle), this promotion edges us to a full GC over time from which we fully recover but its not optimal (expensive pause). We see this as a creeping used heap which flattens with the change.
So in GC viewer without my change the rate of increase of used heap is steep (this memory can be reclaimed with a full GC) with it it flat lines almost . The second case is preferable because the partial GC's are much less and not rising as opposed to the first case and a full GC if it occurred would be very expensive. We're using GC viewer for GC stats and we can see the effects in our latency stats also.
I was surprised the results transferred from my initial app to the server.

(This change appears to have no effect if CMS (default settings) is the collector and the JVM is on Solaris x86).
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

RIght. CMS will result in smaller pauses during full GC. In the long run, you are better off having GC run less often because GC gets more efficient when it has to collect more objects. The problem with that is that full GC results in longer pauses that can be terrible for an application that is constantly running. Running GC more often might result in smaller pauses, but it would result in more CPU being spent on GC, which would result in reduced throughput. That's why CMS is better for CPU intensive applications because it results in smaller pauses during GC

Instead of tricking the GC into running more often, you are better off using CMS. In the long run you will be "wasting" less CPU on GC

Ultimately, if short lived objects are becoming a problem, you might want to look at your design and try to reduce short lived objects. One of the strategies of improving performance of highly concurrent CPU intensive applications is to pool frequently used resources (just like you pool database Connections). You can apply the same strategy as connection pools on objects that are frequently used. For example, let's say you are frequently multiplying matrices together, you are better off creating a pool of result matrices, and use it over and over for every multiplication operation. That way you save a lot on GC. The trade off is that you increase a lot of complexity of code. That's why you should do it on objects that give the most benefit.
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

CMS gets debated at length ;-) ... and is another story ..

Any ideas why the full in this scenario appears to improve performance in this scenario ? One possibility we thought of was that the GC might be tuning itself in some way and perhaps by providing some appropriate initial parameters we could in fact emulate this behaviour without the need for the code.

Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

Xmin and Xmax should have been Xms and Xmx. My bad.

I had some ideas that revolved around the gradual allocation of the memory when Xms < Xmx. It's moot now, of course.

I would guess that the initiation you perform has some effect on generation sizing by the GC. Have you already read some GC performance tuning guides, such as this?
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

Hi, yep tried a lot of settings and done a lot of background reading. I haven't found an explanation for this as yet.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

You probably found a hack. I wouldn't rely on it
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

I guess that's why I am trying to understand it in terms of GC configuration i.e. achieve the same result in a different way.
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

It's hard to reccomend without actually looking at how GC performs. I would start by enabling CMS and playing around with how fast the generations grow or shrink by.

I suspect that the reason you saw a performance boost was because the hack that you did at startup caused the young generations to grow quickly, which makes it more efficient for GC to clear short lived objects (up to a limit of course). You can try observing the sizes of the different generations with your hack on. That might give you a clue about how to setup the GC at startup.
Chris Hurst
Ranch Hand

Joined: Oct 26, 2003
Posts: 407
    
    1

We've tried CMS many , many times with many, many settings for this particular application (others you get acceptable results given the boundaries of the problem) the results have never been comparable even without this hack.

I'll dive into the GC logs, I just thought I'd check in case anyone spotted anything obvious I'd missed.
 
Consider Paul's rocket mass heater.
 
subject: Unexpected Standard GC performance boost
 
Similar Threads
Explicity making objects eligible for gc
Filtering Collections
Very Puzzling Performance Problem
Garbage Collection
Questions for Martin & Benjamin