We're developing a web app, we get up to perhaps 5 hits/second. The work is done by routines which create a lightweight pojo, populate it from the web page, send it off (think pre-Cambrian RPC), and produce a populated pojo. There are many of these pojos and a typical web action could cause perhaps 5 sets of requests and responses to be created. So that's perhaps 50 new object creations/sec.
One developer is a total Luddite and prefers low-tech easy solutions. He's comfortable with making new input and output pojos as they are needed and letting the garbage collector do what comes naturally. That would be me
The other developer wants to be as efficient as possible and is talking about creating a cache of pojos and reuse them to reduce work on the garbage collector.
Now the pros and cons of the first approach are clear. It's easy and requires no additional code. It is easier to test. It has possibly garbage collection issues as its "con."
The pros and cons of the second approach aren't as clear. The pros are possible efficiency. The cons are more numerous:
1) Now you need to write some caching code. That's more to break and test.
2) The caching code needs to be synchronized. A little contention can remove any efficiencies gained, I'd think.
3) There are perhaps a thousand pojos so you need 1000 caches. You can make a class which can cache any of them. But at some point you have to create a new instance of the desired class. The developer is proposing that we do this using reflection by specifying the full package-and-class as a String and creating an instance in the caching class. Something like:
The String serves two purposes - it could be a hashkey index for the pool, and it also allows the pool class to instanciate it if we need a new instance. That may work but to me it is quite stinky. If we passed Pojo12.class at least we wouldn't be pretending that we have loose coupling.
4) After you're done you have to remember to release your pojos back to the cache. So that's more code and testing. We're talking C-language-like memory leaks here.
Opinions please. I don't want to be a total ogre about it but I haven't had a garbage collection issues in years.
I pretty much agree with you. Generally the coding and maintenance hassles of object caching outweigh gains, unless the objects are sufficiently "heavyweight". The classic example of a heavyweight object being a Connection, since it takes considerably more time to create than a typical POJO. Threads are also often worth pooling. Other objects might be - usually if they have links to external resources beyond the data fields of the object itself. I.e., if it's more than just a POJO. (Depending how you define POJO; it's a bit subjective.)
At the very least, I'd develop the simpler (cacheless) solution first, and see how well it performs. A simple way to see how much time is consumed by GC is by invoking the JVM with java -verbose:gc. How to do this in a web server depends on the web server, I guess, but somewhere there's a JVM being invoked to kick it off, usually in the startup script. Just tweak the script to include the -verbose:gc option, and see what the output looks like. (Should go to standard out, wherever that's routed to.) My gut feeling is, creating and GCing 50 POJOs / second doesn't sound like a very big deal Modern garbage collectors are often pretty fast (especcially compared to the earliest releases) and presuming they're going to be slow is often a mistake.
To add to your list of potential problems with caching, there's the possibility that you may accidentally bleed state from one request to another, if you fail to properly clear out the data in the POJO when it's returned to the cache. Typically you need a clear() method or something similar to reset the fields. Which is just another thing to maintain - you may add a field to the POJO, but forget to add it to the clear() method. Alternately if you construct a new object, it's much more likely that the fields were properly initialized to whatever their defaults should be.
In the event you can't convince your coworker not to waste time on this, you may benefit from introducing some sort of factories (possibly abstract) to obtain POJO instances from, which can hide whether the instances are new or cached. A simple implementation:
You can use things like this to create your POJOs, and the other developer will know that they may replace this implementation with a more complex cache, but maybe not right away. And if they do so anyway, putting in a more complex implementation, you can easily revert to a simpler solution later, either temporarily or permanently. This can be a useful way of checking to see if the cache solution is really providing any benefits, or not. And if not, maybe you can use this to help convince the other developer to stop wasting more time. Or to convince your mutual boss that she should find you a more productive co-worker. [ October 16, 2006: Message edited by: Jim Yingst ]
Just want to strengthen Jim's point: modern garbage collectors are *very* efficient in collecting short lived objects. It's what they are optimized for.
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Joined: Mar 15, 2005
Thank you for the responses. When I say pojo, I mean an object that has a lightweight constructor, has no remarkable memory requirements, and has no links to external resources such as db connections or threads.
In this case, the pojos are subclasses of a baseclass but the same rules apply to the base class.
So far then, we don't see a reason to pool/cache pojos.
(revision - Upon consideration, the reference to 1000 classes in the first post was an exageration, we really have around 400 matched pairs of pojos. It hardly matters, but hey...)
Now, the "problem" with the factory solution above is that I have something like 400 (matching) pojo pairs, a builder and debuilder. If I add a factory in that fashion, I need 400 factories if they are explicit, and 1 if they aren't. But if they aren't explicit I have to deal with the co-worker wanting to pass the package-and-class as a String...
(aside - the pojos are all generated by a java utility that consumes a DDL-ish file. I could just as easily generate the factory classes at the same time. So it would be no "additional" code once the code generator was enhanced...)
That, however, seems to be a different issue. Issue #1 is that there seems to be no real lift in caching my pojos.
[ October 17, 2006: Message edited by: Tony Smith ] [ October 17, 2006: Message edited by: Tony Smith ]
For a laugh, google for "premature optimization evil"
Trying to come up with spiffy fixes for problems that may not exist is a recipe for yet another project that is late and full of bugs. Bill
Joined: Jan 30, 2000
The problems with the factory solution seem to be largely addressed by the fact you're using code generation here. And the individual factories can be kept simple by putting much of hte functionality in an abstract bast class. (Or use composition rather than inheritance, maybe.) This isn't a road I'd really want to go down, but my point is that if you are unable to dissuade your co-worker from a caching solution, try to guide that solution in a form that makes it easy to switch between a caching solution and a non-cacheing solution. These factories I'm talking about would be part of the caching solution - quite possibly they'd be the caches themselves. But from the outside, we don't need to know whether they use caching or not.
Anyway, yeah, we all seem to agree that using caching seems pretty silly at this point. The factory stuff is just a possible compromise if you can't dissuade your co-worker.
I would also note that there's one other use case where caching might make sense - that's if you have a large number of POJOs with identical contents, and you can make the POJOs immutable. Then you may get significant benefits from sharing those immutable objects among all clients who need a POJO with the same content. Like the String pool, for example. At this point there seems to be no particular reason to think this is the case for you, so it's, again, premature optimization. But it might come up later... [ October 17, 2006: Message edited by: Jim Yingst ]
Joined: Mar 15, 2005
Thanks for the suggestions, again. I feel a lot better.
The truth of the matter is that the co-worker is actually my consultant and will do what I say. I sought the reality check here since I didn't want to be a totally close-minded ogre. The consultants we have are pretty decent and knowledgable people and not wisely dismissed out-of-hand.
The pool is nixed since he can't give an example of where it is beneficial while the cons are considerable.
He's pretty adamant, however, that the pojo creation should be limited to one place. I'd personally hammer the instanciation directly into the code. He was uncomfortable with it, so what I offered was similar to the suggestion above:
1) ReqPojo x = AFactoryThing.get(ReqPojo.class);
Which is explicit, which I like, but the creation (or recycling) of the instance of ReqPojo is controlled in AFactoryThing.get(). The get() method would simply return a new instance of the class passed in until the time comes when it becomes insufficient, if ever.
2) ReqPojo x = ReqPojo.getInstance();
Where the static getInstance() method invokes a method in the super class, passing ReqPojo.class as an argument. This removes the AFactoryThing while allowing it to be easily inserted into the super class later. [ October 17, 2006: Message edited by: Tony Smith ]