aspose file tools*
The moose likes Performance and the fly likes How to interpret multi-thread test results Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "How to interpret multi-thread test results" Watch "How to interpret multi-thread test results" New topic
Author

How to interpret multi-thread test results

Jay Damon
Ranch Hand

Joined: Jul 31, 2001
Posts: 282
I have multi-threaded test that I am working on. My question is: How should I interpret the results? The test creates an arbitrary number of threads (5 in this example) with each thread obtaining a Connection, retrieving data from approximately 10 database tables (not counting reference tables which are cached with the first access), creating objects and establishing relationships and, finally, creating XML output data.

The Test class maintains a timer for each thread as well as a static timer to measure the elapsed time between the start of the first thread and the end of the last thread to complete. Here are the example results:

Instance 0001 Elapsed Time = 0:09:031
Instance 0003 Elapsed Time = 0:07:953
Instance 0002 Elapsed Time = 0:08:703
Instance 0004 Elapsed Time = 0:06:953
Instance 0005 Elapsed Time = 0:06:094

Total object finds = 5
Total elapsed time = 9.391 (seconds)
Average access time = 1.878 (seconds)

I am pretty pleased with the total elapsed time and average access time. However, I am concerned about the elapsed times for the individual instances. My interpretation of these results is that there is too much work to be performed during a single timeslice. So, even though the average time is respectable, the actual elapsed time for each individual instance is comparable to the total elapsed time because multiple timeslices are required to complete the work performed by each instance.

Is this a correct interpretation? Or, is there another explanation?

If this is, in fact, a timeslice issue, is there any way, i.e. a JVM parameter, to modify the timeslice allotted to each individual instance?
[ November 21, 2005: Message edited by: Jay Damon ]
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18906
    
    8

the actual elapsed time for each individual instance is comparable to the total elapsed time because multiple timeslices are required to complete the work performed by each instance.
My interpretation would be that the total elapsed time is equal to the startup and shutdown time plus the longest individual instance time. It certainly can't be less than that and there's no reason for it to be any more. So if startup and shutdown are small, I would expect the total elapsed time to be pretty much the same as the individual instance times.

Now if your comparison test was to run this with only one thread, I suspect you would see some different numbers.
Stefan Wagner
Ranch Hand

Joined: Jun 02, 2003
Posts: 1923

My interpretation of these results is that there is too much work to be performed during a single timeslice.


I don't think so.
The shortest thread needed more time to finish, than the average time, so it was pausing without being able to work.
IMHO using threads makes sense here.
While you connect in the first thread to the database, and wait for response, you aren't waisting time, but starting the other threads.


http://home.arcor.de/hirnstrom/bewerbung
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12823
    
    5
If this is, in fact, a timeslice issue, is there any way, i.e. a JVM parameter, to modify the timeslice allotted to each individual instance?

Thread priority is the only control you have over the allotment of CPU time to the individual instances. By default each Thread created gets the priority of the creating Thread.
Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18906
    
    8

My interpretation of these results is that there is too much work to be performed during a single timeslice.
I think that goes without saying. Timeslices are only going to be very small fractions of a second, and your tasks take much, much longer than that. Anyway if you could arrange things so that the timeslice was long enough for each of your tasks to fit into one timeslice each, you would then have your tasks running sequentially, i.e. using threads would be pointless.
Jay Damon
Ranch Hand

Joined: Jul 31, 2001
Posts: 282
Thanks for the comments thus far. I am getting hammered on this so any insight that anyone can provide is appreciated.

FYI, the application I am working on is an insurance application with policies from 1 to 500 lines. The application has a maximum of 200 users at any given point in time.

After executing more tests, here is what I have observed:

1. If I increase the interval between individual thread creation, the individual thread elapsed times become respectable. However, the total elapsed time from the start of the first thread to the end of the last thread (to complete) appears to increase slightly.

For example, with an interval of 1000 milliseconds:

Instance 0002 Elapsed Time with XML = 0:04:812
Instance 0001 Elapsed Time with XML = 0:06:343
Instance 0003 Elapsed Time with XML = 0:04:282
Instance 0005 Elapsed Time with XML = 0:02:219
Instance 0004 Elapsed Time with XML = 0:04:171

Total elapsed time = 0:10:156
Average access time = 2.031

With an interval of 2000 milliseconds:

Instance 0001 Elapsed Time with XML = 0:04:532
Instance 0002 Elapsed Time with XML = 0:03:359
Instance 0003 Elapsed Time with XML = 0:02:485
Instance 0004 Elapsed Time with XML = 0:03:687
Instance 0005 Elapsed Time with XML = 0:02:156

Total elapsed time = 0:11:766
Average access time = 2.353

The individual thread elapsed times start to approach the times I have observed if I run each individual test on the same thread.

It would seem to me that providing some "spacing" between thread creation allows the test instances an opportunity to at least partially execute, thus requiring fewer timeslices.

2. The original test policies, for the times listed in the original post and for 1. above, represent some of our largest. However, these policies represent only 1/2 of 1 percent of all policies. I modified my original test such that it will query all policies and select one at random for each thread instance.

I have found that, using this test scenario (on my development machine), I get a good sampling of both large and small policies and can decrease the thread interval to as little as 300 milliseconds without any performance degradation. Using this thread interval, the test can handily process 200 policies per minute. Individual thread elapsed times generally range from 0.250 to 3.000 seconds. I would expect a web server to do better.

This has become a major issue because an outside company is performing load tests of our web application. The tests are being performed using the largest policies with a relatively small interval between page requests. The times are (I would agree) outrageous with response times of 30 seconds or more.

First of all, I believe the test is unrealistic because the test disproportionately accesses large policies at frequencies much higher than normal usage. Second, I believe the test results obtained would indicate that we require more web servers rather than a problem exists with the application if, in fact, the application were to experience this level of usage. Third, the test is being conducted on a single web server whereas our production environment has dual web servers. Fourth, I asked our support people today if our production users were reporting any significant performance issues. The answer was no.

The outside company has pointed a finger squarely at my code. However, the test in the original post and the tests performed in 1. and 2. above perform the same functions as the "bottleneck" method in the web application that invokes my code. I agree that my code takes a relatively long time to execute but it performs a significant chunk of the application's tasks. And, as I indicated above, I can achieve a throughput of 200 transactions per minute on my development machine so I fail to understand how the web application is experiencing this bottleneck.

Personally, I'm not sure there is actually a problem except under a large load. I think the problem has more to do with the relatively large transactions and relatively short thread creation intervals than any code bottleneck.

However, our managers are "concerned" so now we are now pushing ahead with initiatives to: 1) Upgrade to the latest Tomcat version, 2) Upgrade to the the latest JVM, 3) Test different JVM parameter configurations, 4) Create a test environment using a 64-bit server, 5) Use multiple JVMs on a server, etc. Will any or all of these provide any significant performance improvement?

I'm not concerned that the load tests point a finger at my code. I am just trying to understand how my code can perform so well under load during tests on my development machine but perform miserably under load tests of the web application. I would expect it to perform better on the web server. Are there any significant differences between my development (Eclipse) JVM and the web server JVM that I should be aware of?

I would appreciate any comment. Thanks.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12823
    
    5
I think you should forget about thinking in terms of "time slices" - let the JVM do its own Thread handling. It sounds to me like the creation of the DB connection is the real place to look. This big surge of attempts to create a connection may overwhelm the DB side of things. Creating a new connection for each query is really wasteful.

So why are you not using a connection pool? The mechanism has been in the standard library since 1.4 - or you can use one of the many open source implementations.

Bill
Jay Damon
Ranch Hand

Joined: Jul 31, 2001
Posts: 282
Bill,

Thanks for the info.

I may not have been clear in my description of the tests being performed. On the web server (application), connection pooling is used. It is only when executing tests on my development machine that I actually create Connection objects for use. And, as I indicated, I can achieve reasonable response times in the standalone tests on my development machine depending upon the size of the interval between individual thread creation. I don't think Connection creation is the issue unless perhaps you think the connection pool on the web server may be too small?

I am not so concerned about timeslices as I am about trying to explain what I am observing in the tests on the web server and my development machine. I am content to let the JVM do its own thread handling. However, from my observations, it appears that when relatively large policies are submitted one after another with relatively short intervals between thread creation, the JVM is overwhelmed; there is too much work for it to handle and, for lack of a better term, it begins "thrashing".

Given the correlation between increased policy size / decreased thread interval and the amount of throughput that can be achieved in a standalone test environment, this is the only explanation that appears to me to fit the facts. However, I would welcome other theories as to what is going on here.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18906
    
    8

when relatively large policies are submitted one after another with relatively short intervals between thread creation, the JVM is overwhelmed
To me this sounds more like the large policies are consuming large amounts of memory, causing garbage collection to run more often. But that's just another hypothesis.

If it was me I wouldn't be happy until the machine I was using to test these hypotheses was exactly the same as the machine you are trying to simulate. Your test machine doesn't have connection pooling, but the production machine does. The production machine is running Tomcat but the test machine...? And what about memory on the two machines?
Harald Kirsch
Ranch Hand

Joined: Oct 14, 2005
Posts: 37
The individual thread elapsed times start to approach the times I have observed if I run each individual test on the same thread.


Multithreading only pays if there is CPU power to spare. When you run each individual test in the same thread, how much idle time does the CPU have? Put another way: Lets call cpuuse:=(100%-idle). If cpuuse is >50%, a 2nd thread would want to increase it to >100%, which is not possible. Consequently the duration to finish two jobs will increase beyond 2x(one job).

If cpuuse is close to 100% for one thread already, there is no point in starting several threads, because it only creates overhead. (In all cases, cpuuse includes cycles burned by the database.)


Harald.
Jay Damon
Ranch Hand

Joined: Jul 31, 2001
Posts: 282
To me this sounds more like the large policies are consuming large amounts of memory, causing garbage collection to run more often. But that's just another hypothesis.


This is true. A large policy may easily consume 4-5 MB of memory (I have conducted memory usage tests). I have not worried about that because: 1) each of our 2 servers has 2GB of memory, 2) these policies represent only 1/2 of 1 percent of all policies, 3) my elapsed time tests show these large policy objects will take 3 seconds or less to execute (in a single-threaded environment), and 4) there are never more than 200 users. So memory should not be an issue.

I agree that I would like to perform the tests myself but I have no access to the web server box. I can tell you that my development machine has 1.25 GB of memory. As I indicated above, each of our servers has 2GB although I am told that only 1GB can be configured for the JVM. Apparently, if our administrator attempts to configure more than that, he receives an "Invalid startup options"? (I'm just relaying what I have been told.) And you are correct, the web server is running Tomcat; the unit tests executed on my development machine do not.

Multithreading only pays if there is CPU power to spare. When you run each individual test in the same thread, how much idle time does the CPU have?


I am told that, during the load tests on the web server, total CPU utilization does not exceed 50 percent. I'm not sure what that tells me. Note also that this is a dual-CPU box.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12823
    
    5
So in the tests on your local machine, how much of the observed time is consumed in getting the Connection in the first place versus executing the data retrievals and formatting? What does the CPU load on your local machine look like during all of this - is it spending all its time waiting for the DB server?

Bill
Jay Damon
Ranch Hand

Joined: Jul 31, 2001
Posts: 282
Bill,

On my local machine, I have to create a Connection as well as a security object to allow database access. When profiling on my local machine, the methods that perform these tasks are #1 and #3 in terms of overall time so I would think that indicates the rest of my code is performing pretty well. The CPU usage is pegged.

That said, we have since discovered that the problem is not with my code but rather that the number of Connections in the database pool was insufficient for the load test on the web server. As I indicated previously, the load test being conducted was not representative of normal usage. The Connection pool limit was 100. When increased to 200, the application ran just fine.

The problem is that existing code (which we inherited from a 3rd party) is poorly written and appears to "hang on" to Connections rather than use the database pool as it was intended to be used. My code represents the first steps towards replacing this code. However, the two sets of code will have to co-exist for awhile.

Jay
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: How to interpret multi-thread test results