Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Performance and the fly likes Long lived objects, Collections and GC Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Long lived objects, Collections and GC" Watch "Long lived objects, Collections and GC" New topic
Author

Long lived objects, Collections and GC

Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
Our ValueObjects are simple wrappers around Hashtables. The data in these Hashtables are either simple objects(Integer, String etc) or other ValueObject. We have a session level cache that is a simple collection of such ValueObject.

Throughout the application flow, we hand out references to cached ValueObjects to other components. They even cross JVM boundaries.

What are the true memory leak concerns with this approach? What should be done to ensure GC gets all the help it can and efficiently reclaim memory? How true are the GC concerns with using nested collections?


Open Group Certified Distinguished IT Architect. Open Group Certified Master IT Architect. Sun Certified Architect (SCEA).
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
First, I assume "ValueObject" refers to what the Core J2EE Patterns authors called ValueObject, in the 1st edition? A DTO? Nothing to do with ValueObject as used in the wider patterns community, which is why in the 2nd edition they renamed the so-called ValueObject to TransferObject. Though as one can see here, the renaming process is sometimes incomplete. :roll: I'll just stick with DTO instead.

Anyway, irrelevant rant aside - I don't think there's anything special to worry about with respect to the fact that these have Hashtables inside them. Other than the fact that Hashtable and Vector are EEEVIL. (Must... avoid... another... rant...) And that aside, they're potentially big objects, so their fate is more significant that that of, say, a single Integer. The main issues are: who all is getting references to these objects, and do they hold onto those references longer than they need to?

These DTOs cross JVM boundaries - are they serialized, or are they remote objects? If the former, then any copies held by other JVMs are irrelevant to the GC in a given JVM. If they're remote objects, well, then the other JVMs are relevant, and analysis will get considerably messier. I can't think of a good reason for a DTO to be remote, so as long as you're going the serialization route you should be fine.

You say that one DTO may contain references to other cached DTOs. This has a potential to create vast sprawling networks of interconnected objects in memory. If they're widely circulated and you aren't certain that all relevant clients drop the references when no longer needed, perhaps it would be better for each DTO to just contain the ID of the other DTOs it wants to reference? I'm assuming that if you've got a caching mechanism, there's some lookup process to get things from the cache, and whatever key you're using for the lookup, that's what I mean by ID. Anyway, this would make it easier to release objects from the cache when they my-or-may-not ever be needed again. It creates a slight performance penalty of having to look things up again in the cache, and if the objects have been released, a larger penalty when you have to look the object up again. So for objects you're certain you need again, better to retain a hard reference; for objects you're not so sure about, just retain the ID.
[ June 06, 2006: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Ajith Kallambella
Sheriff

Joined: Mar 17, 2000
Posts: 5782
I'll just stick with DTO instead.
Think of them as POJOs.

Other than the fact that Hashtable and Vector are EEEVIL.

Yup, we recognize that. This is a purchased product and I'm sure we'll end up fixing major parts of it.


These DTOs cross JVM boundaries - are they serialized, or are they remote objects?
They are serialized and passed to remote method calls as arguments. The objects themselves are not remote objects.


So for objects you're certain you need again, better to retain a hard reference; for objects you're not so sure about, just retain the ID.
Cached objects are typically object graphs of some domain structures and hence hard references are preferred over an ID-based fetch.


Unfortunately there aren't many tools that track an object's lifecycle across JVMs. Since these are large objects with complex structure, I am very concerned about memory leaks. And just to clarify, this is not an object pool, but simply a cache.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[Ajith]: Think of them as POJOs.

Oh sure - bypass the issue. That works better anyway, since it sounds like these things are used for more than just transfer, and are more complex than a traditional VO.

[Ajith]: Cached objects are typically object graphs of some domain structures and hence hard references are preferred over an ID-based fetch

OK. How well-understood are these structures? Are they constructed in a way that ensures they will be finite in scope, or do they have the potential to become an incestuous network of interlinked objects that includes most or all of the POJOs in existence? (Which would still be finite, technically, but I'm hoping you'll know what I mean without me finding a better way to articulate it.)

You may well benefit from replacing the Vectors with WeakHashMaps. That would be more like a tradtional hard reference than the ID would be, but still allowing for GC if weak references are the only references.

[Ajith]: Unfortunately there aren't many tools that track an object's lifecycle across JVMs. Since these are large objects with complex structure, I am very concerned about memory leaks.

Sounds like a reasonable concern - but the risk comes mostly from not knowing what the rest of the application is doing with these things, doesn't it? Presumably you try to analyze this as much as possible, and then fall back on careful testing to find what you may have missed. I would think that if there are memory leak issues, they will be visible within a single JVM. Can you analyze the memory usage of an individual JVM? Regardless of whether a given object may have originated on another JVM, if a serialized copy is brought into a given JVM, and then it's somehow retained in memory forever, that's a problem. This may still be nontrivial to diagnose (depending on what tools you have), but I think it can be understood within the context of on JVM at a time.
[ June 06, 2006: Message edited by: Jim Yingst ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Long lived objects, Collections and GC