I met the following problem: when application intensively creates and drops objects it run faster on 1 CPU than on 4. I encountered this when wrote benchmark that has to compare server performance for our application. Benchmark is quite simple: it runs 200 threads simultaneously. Each thread creates several thousand objects. After thread finishes it is replaced by new thread. Benchmark runs 5000 threads. The result is mean number of objects that was created during 1 ms For server with 2 CPU Intel Xeon 2,4 the results was: 1 cpu - 410 2 cpu - 355 4 cpu - 272 It means that additional CPU reduces overal system performance. I can send benchmark source code to anybody interested.
The depth of understanding is defined with personal experience -- Miamoto Musasi
The articles Tuning Hotspot 1.4.2 and GC Portal should provide guidance on tuning the garbage collector for Multi-processor systems. The concurrent garbage collector is an example of improvements to the VM to allow greater scalability.
Joined: Feb 24, 2004
Additional details: please see test report P:\PSB>java -server -Xms700M -Xmx700M -jar PSB-A.jar Computer performance report. Test PSB-A (c)ITC-M: Test timestamp = Thu Feb 26 09:06:51 EET 2004 JVM = 1.4.2_02 Count of test threads = 2000 Count of simultaneously active test threads = 200 Count of levels in ObjectA tree: 7 Arity of ObjectA tree = 4 Size of array in ObjectA (in 4 byte words) = 64 Explicit activation of garbage collector = false Explicit nulling of ref to ObjectA just before test thread finishes = true TEST RESULTS Test has executed normally = true Sleep time between thread updates = 5 Count of created object = 43690000 Count of started threads = 2000 Count of finished threads = 2000 Test duration (msec) = 97999 Mean number of objects handled (created, accessed, destroyed) during 1 msec (PSB-A) = 445
In share memory multiprocessor setup, memory allocation should require some level of CPU synchronization. All you seem to be doing is allocating memory, but not performing any calculations. I think your results are to be expected. When the memory is not shared (single CPU), there is no contention.