Forum:

Performance

WeakReference, SoftReference, caches & canonical tables

Author

Posts: 96

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

WeakReferences are (supposed to be) collected before SoftReferences. Sun recommends to use SoftReferences for caches and WeakReferences for canonical tables. My point of view is that caches tend to grow and grow, and if the garbage collector is going to kick in, then I'd want the caches to be collected first (so I think caches should use WeakReferences).
However, I've never come across a situation where the theoretical difference between the two reference types mattered enough to be able test out what is better for performance (even ignoring the practical problem that current JVMs seem to always collect both types of references together regardless).
Has anyone here come across any evidence or strong reasoning for which reference should be used for which type of activity?
--Jack Shirazi http://www.JavaPerformanceTuning.com/

John Bateman

Ranch Hand

Posts: 320

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi
Please excuse my ignorancec but what is a weak and strong reference?
Maybe I've worked/seen em and don't even know it.

SOURCE CODE should be SURROUNDED by "code" tags.

Peter Tran

Bartender

Posts: 783

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

John,
Read this article.
-Peter

Peter den Haan

author

Posts: 3252

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Jack Shirazi:
My point of view is that caches tend to grow and grow, and if the garbage collector is going to kick in, then I'd want the caches to be collected first (so I think caches should use WeakReferences).

Beg to differ. If the garbage collector needs to free up some memory, and it has the choice between memory which serves no purpose at all, and free-able (cache) memory which is still doing something useful, which do you think it should pick?
This is not dissimilar to OS cache management -- free memory is bad. You want as little of it as possible. When allocating memory, it should be taken from the free memory pool first and only then from cache pages.
Having said that, I agree that from my admittedly small experience JVMs don't seem to make a difference between soft and weak references, it all gets cleaned up in a short time.
- Peter

Peter den Haan | peterdenhaan.com | quantum computing specialist, Objectivity Ltd

Jack Shirazi

Author

Posts: 96

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Okay, what we are discussing here is <em>after</em> the garbage collector has freed all the "no-purpose" memory. It has reclaimed all the garbage objects, and wants more space, so now it decides to plough into the Reference held objects.
And the question here is: I have two types of references, one of which will be reclaimed before the other (theoretically). So I have two sets of objects, and one set will be reclaimed before the other. Typical uses for the two sets of objects are object caches and canonical tables (other uses are feasible, but I'll stick to those two). So which should be reclaimed first? Why? What is the justification? As I said, I have not been able to find a working application where the difference could be measured, nor identified a convincing argument for preferring to allocate one reference type for one use or the other. Has someone else?
--Jack Shirazi http://www.JavaPerformanceTuning.com/

Peter den Haan

author

Posts: 3252

posted 23 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Jack Shirazi:
[...] object caches and canonical tables [...]. So which should be reclaimed first? Why? What is the justification?

The canonical table, without question. The key to the answer is, what value does the information still have once it has become unreachable?
The information in the canonical table is completely useless once it is no longer directly reachable. You can just as well instantiate a new copy of the object when you need it. Except, of course, when this would take a lot of time and resources to do, but in that case you would want to use a cache rather than a canonical table. (To take another example, in a WeakHashMap the mapped objects are typically unreachable and useless once the application has lost all reference to the key).
In a cache, on the other hand, the cached information presumably takes time and resources to reconstruct, which is why you want to cache it in the first place. Even when the application has lost all reference to the information, there is a definite chance that it will be referred to again (otherwise you wouldn't cache it) so there is every reason to hold on to the data unless you have a better purpose for the memory.
Does this make it clearer? I'm not entirely sure I understand what you find most unclear.
- Peter

Peter den Haan | peterdenhaan.com | quantum computing specialist, Objectivity Ltd

Jack Shirazi

Author

Posts: 96

posted 22 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hmm. What I find unclear is that it is indeed the case that choosing the canonical table is more efficient. I'll think it through further. Firstly, if I have a few small canonical objects, I'm not going to bother with a canonical table at all. So it's only when the canonical objects are many or expensive to construct that I want to use the extra indirection that canonical tables impose. Is this cost greater than reproducing some cache elements which are likely in flux anyway? Many of the cache elements are probably garbage at any one time.
Then there is the issue of the likely difference in sizes between the canonical tables and caches. Even a large canonical table is quite likely to be significantly smaller than the cache. So if canonical table elements are reclaimed, will I always have the cache elements reclaimed as well anyway, as the canonical table won't release enough resources and the GC goes on to the next Reference class? And as far as I know, there is no prospect of only some elements being cleared, until resources are sufficient. For each Reference type it seems to be all or nothing. On the other hand, if I let the cache get cleared first, there is a much higher chance that the canonical table does not need to cleared at all because of its probable smaller size.
The performance will be improved by what is likely to provide the required resources with the least effort. I don't see that it's an open and shut case. What I do see now from this discussion is that I'm being too simplistic in only thinking about canonical tables vs. caches. Considering your argument, I can see that a more useful situation may be to consider two classes of caches based on Reference type as a primary use. After all, a canonical table is really just a type of cache. Then you can put the more expensive elements into the cache that will be cleared later, and allow the cache that is cleared earlier to grow larger. The two cache classes can be subdivided within themselves into network object cache, canonical table, etc.
Thanks, that's been useful. Now I need to find a JVM that actually manages to clear the Reference objects separately, so that I can test this. Anyone know of one?

Peter den Haan

author

Posts: 3252

posted 22 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

This JDC article about reference objects may be helpful, too (although I think it doesn't really cover any new points).
- Peter

Peter den Haan | peterdenhaan.com | quantum computing specialist, Objectivity Ltd

Did you see how Paul cut 87% off of his electric heat bill with 82 watts of micro heaters?