• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Unexpected Standard GC performance boost

 
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

This is a strange one I run a lot of performance tests and noticed that the standard collector appears to perform better (reproducibly) after a full gc at the start (currently I force this at the start by filling memory and catching the exception rather than System call). By better I mean seems better at collecting shorter lived objects going forward, seems to catch more. We collect GC logs and use GC log viewer, a few theories spring to mind e..g there will be a small amount of clear out and the start , the GC tuning itself ? Anyone seen similar results (Java 6) ? I initially found this with a small test app I was using to prove an issue with Weak references but have since moved it to a large long running server application, the difference is big.

Chris

 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How exactly does "better at collecting shorter lived objects going forward, seems to catch more" translate to CPU, time or memory savings? (I actually don't understand what the sentence means...)

Can the same improvement be achieved by setting the Xmin and Xmax to the same value (so that the VM allocates all its memory at the start, which is what you achieve by allocation till the OutOfMemoryException)?

Doesn't the effect "wear out" over time (is it still noticeable after, say, 1000 GC runs, disregarding previous runs)?
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

i) We run Xmx = Xms , always all test cases. What is Xmin ???

ii) I've currently done up to 2 day runs (that's a lot of GC's, I'll take it to 5-7 eventually), there is no sign of it wearing out. The current application I'm profiling is a very heavily loaded server handling scripted clients.

iii) By better is ... we generate objects which are not permanent, if they can live long enough relative to the GC cycle they will be promoted and could live until a full GC (if they live just long enough relative to the GC cycle), this promotion edges us to a full GC over time from which we fully recover but its not optimal (expensive pause). We see this as a creeping used heap which flattens with the change.
So in GC viewer without my change the rate of increase of used heap is steep (this memory can be reclaimed with a full GC) with it it flat lines almost . The second case is preferable because the partial GC's are much less and not rising as opposed to the first case and a full GC if it occurred would be very expensive. We're using GC viewer for GC stats and we can see the effects in our latency stats also.
I was surprised the results transferred from my initial app to the server.

(This change appears to have no effect if CMS (default settings) is the collector and the JVM is on Solaris x86).
 
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
RIght. CMS will result in smaller pauses during full GC. In the long run, you are better off having GC run less often because GC gets more efficient when it has to collect more objects. The problem with that is that full GC results in longer pauses that can be terrible for an application that is constantly running. Running GC more often might result in smaller pauses, but it would result in more CPU being spent on GC, which would result in reduced throughput. That's why CMS is better for CPU intensive applications because it results in smaller pauses during GC

Instead of tricking the GC into running more often, you are better off using CMS. In the long run you will be "wasting" less CPU on GC

Ultimately, if short lived objects are becoming a problem, you might want to look at your design and try to reduce short lived objects. One of the strategies of improving performance of highly concurrent CPU intensive applications is to pool frequently used resources (just like you pool database Connections). You can apply the same strategy as connection pools on objects that are frequently used. For example, let's say you are frequently multiplying matrices together, you are better off creating a pool of result matrices, and use it over and over for every multiplication operation. That way you save a lot on GC. The trade off is that you increase a lot of complexity of code. That's why you should do it on objects that give the most benefit.
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
CMS gets debated at length ;-) ... and is another story ..

Any ideas why the full in this scenario appears to improve performance in this scenario ? One possibility we thought of was that the GC might be tuning itself in some way and perhaps by providing some appropriate initial parameters we could in fact emulate this behaviour without the need for the code.

 
Martin Vashko
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Xmin and Xmax should have been Xms and Xmx. My bad.

I had some ideas that revolved around the gradual allocation of the memory when Xms < Xmx. It's moot now, of course.

I would guess that the initiation you perform has some effect on generation sizing by the GC. Have you already read some GC performance tuning guides, such as this?
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, yep tried a lot of settings and done a lot of background reading. I haven't found an explanation for this as yet.
 
Jayesh A Lalwani
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You probably found a hack. I wouldn't rely on it
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guess that's why I am trying to understand it in terms of GC configuration i.e. achieve the same result in a different way.
 
Jayesh A Lalwani
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's hard to reccomend without actually looking at how GC performs. I would start by enabling CMS and playing around with how fast the generations grow or shrink by.

I suspect that the reason you saw a performance boost was because the hack that you did at startup caused the young generations to grow quickly, which makes it more efficient for GC to clear short lived objects (up to a limit of course). You can try observing the sizes of the different generations with your hack on. That might give you a clue about how to setup the GC at startup.
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We've tried CMS many , many times with many, many settings for this particular application (others you get acceptable results given the boundaries of the problem) the results have never been comparable even without this hack.

I'll dive into the GC logs, I just thought I'd check in case anyone spotted anything obvious I'd missed.
 
When all four tires fall off your canoe, how many tiny ads does it take to build a doghouse?
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic