This week's giveaway is in the EJB and other Java EE Technologies forum. We're giving away four copies of EJB 3 in Action and have Debu Panda, Reza Rahman, Ryan Cuprak, and Michael Remijan on-line! See this thread for details.
I am working to solve a strange performance problem. The application reads data from Oracle via JDBC, reads and writes data via Esri's Arc Engine (which uses lots of native code with a COM bridge), and writes data with Dom4j. With some output options, the app slows down the longer it runs, eventually becoming almost useless. It leaks memory, but not enough to begin swapping. (In a 30 minute test run that slows by as much as 50%, it uses less than 200 MB.) Initial hypotheses were that the native code bridge was the problem, because this has happened before. However, JProfiler data points to slow-downs across most methods, even including native Java methods. For example, calls to String.equalsIgnoreCase and DateFormat.parse run 50% to 70% slower with identical data (comparing calls from early in the run to calls after ~30 minutes). The most (but not very) plausible theories I have are:
1) Garbage collection is running during these calls. However if that were true, seems that the time would not be allocated to these methods.
2) VM Memory management data structures are becoming increasingly expensive to navigate as memory is "fragmented", increasing the time required to follow object references. However, it seems that this would be more of an issue with allocation than dereferencing.
Offhand I like the garbage collection theory. I'm not sure how JProfiler gathers its data, but it seems plausible to me that it might not have a way of measuring how much time is being spent by a method in its own thread. That is, it may just be looking at how much total time it takes to execute a method (the difference between start time and end time) without knowing how much of that time was actually sucked up by other threads, such as GC. Someone more familiar with JProfiler can address that.
One quick way to test this is to run using the JVM option -verbose:gc (as in "java -verbose:gc MyClass"). This will print out extra info on GC to standard out, including statments of how much time each GC is taking. By looking at this output you may be able to tell whether the cumulative time being spent in GC is sufficient to account for the overall slowdown of the other methods.
Regarding theory 2, yes it does seem that would be more of an issue with allocation than with dereferencing. And a method like equalsIgnoreCase() doesn't seem to allocate any new memory other than on the stack, which I would expect to be unaffacted. So this theory seems less plausible to me.
Another thing to try is to take the Esri Arc Engine out of the picture by creating a java-only performance test. Get some sample data loaded into memory and then run that same data repeatedly through the Dom4J and the rest of the app, and see if the performance problem can be replicated. Some performace issues may depend on having a bunch of different data, but many will not. Similarly it might be nice to take the Dom4J out fo the picture, but that may be more entwined in your code. Perhaps it can be mocked out? Or replaced with another XML framework? Just some avenues to explore.
Are you sure every single call takes longer, or are there perhaps just *more calls* that sum up to a bigger total of time spend in those methods?
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Joined: Jul 11, 2001
Originally posted by Jim Yingst: Offhand I like the garbage collection theory. I'm not sure how JProfiler gathers its data, but it seems plausible to me that it might not have a way of measuring how much time is being spent by a method in its own thread.
When you create a profile for a new session, you can select the data collection strategy. I'm quite positive that the more involved strategies in fact measure the true time a method call takes in its own thread.
Joined: Jan 30, 2000
OK - so conversely, it's also possible for a user unfamiliar with the intricacies of JProfiler to select a simpler data collection method that would give misleading results here, yes? Then that would be something for Bill to check, and the good news is that there's a fix already available.
Bill, I'd be interested to hear if you get more info on this problem, as it does seem intriguing.
Joined: Aug 26, 2000
Jim: Thanks for the suggestions. The -verbose:gc option may help shed light on theory #1. Trying to remove Arc Engine so we can test a Java-only version of the code would be extremely expensive if we try to leave the code in anything like a meaningful state. Arc Engine is central to the application. We did try a trivial test like this (allocated lots of objects and timed calls to equalsIgnoreCase) but it ran just fine. Because this was so different from the real thing, I don't think we learned anything from this.
Ilja: The application processes data sets. In order to get clean comparison data, we processed the same data set repeatedly, and timed the second instance and the next to last instances (rather than the first and last, so we don't get any startup or shutdown artifacts). So the input and output data, and (presumably) all the processing between these two data sets are identical. In particular, I compared the number of invocations of methods, and they were all the same, even though the time spent in them was more.
Monday I will be able to look at whether our version (3.3.2) of JProfiler has data collection strategy options that we should try.
Joined: Aug 26, 2000
The time reported by -verbose:gc does not appear to be increasing like the increase in the time the application takes. Without more complicated instrumentation, it's hard to be precise, but this appears to be the case.
I switched the JProfiler session to the "Full Instrumentation" call collection strategy, and I'm still not seeing garbage collection showing up explicitly among the times charged to any methods. I don't see a garbage collection method under java.* that I can add to the list of methods to "un"filter. Is garbage collection done within the VM itself rather than by a java class?
Joined: Jan 30, 2000
[Bill]: I switched the JProfiler session to the "Full Instrumentation" call collection strategy, and I'm still not seeing garbage collection showing up explicitly among the times charged to any methods. I don't see a garbage collection method under java.* that I can add to the list of methods to "un"filter. Is garbage collection done within the VM itself rather than by a java class?
Typically, yes. If you call System.gc() then you'll probably see the time for that method call show up, as it's part of that thread, and caused speicfically by that method call. And if a method throws an OutOfMemoryException it may very well incur some time spent on GC prior to that, and again that's typically in the same thread that will throw the exception (assuming sufficient memory cannot be freed). But other than those exceptional circumstances (almost no one calls System.gc() anymore), GC generally is done in separate threads by native methods in the JVM, and I woudn't expect those to show up in the times reported by other methods and threads (except perhaps by accident if using a simpler collection strategy).
I'm a bit surprised JProfiler doesn't seem to have some sort of hook for monitoring GC more directly - perhaps Ilja or someone else can point something out. If nothing else it seems like they could wrte something that scans the output from -verbose:gc and compiles some statistics based on that.
Hm, this screenshot indicates there's a tab for GC activity. Does that help?
Still, if you're seeing a 50% slowdown, that seems enough that you should be able to notice it just from eyballing the output from -verbose:gc, if GC were really the problem. It's sounding like that's probably not the case.
With full instrumentation running, are you still seeing 50% increase in time for methods like equalsIgnoreCase()?
I don't suppose the program is spawning an increasing number of threads as time goes on? Or for that matter, could there be other processed spawned outside the JVM that are slowing the machine down? If you spawn processes with Runtime.exec() for example, I'm not sure how they'd be counted by a profiler. They're probably processes outside the JVM, but connected with a few pipes. Might be good to take a look at the output of top or Task Manager to make sure there's not something else going on afffecting the whole box.