I know that HashMap and Hashtable are vritually identical except that Hashtable is synchronized. This causes Hashtable to perform slower than Hashmap. Does anyone know if there is any performance difference between a Hashtable and the object that is returned from Collections.getSynchronizedMap(HashMap)?
It is generally understood that while a Hashtable is slightly slower than a Hashmap it is faster than a synchronised Hashmap. Once you have the object retrieved out of either one the performance of operations by or on that object are not affected by the way it was stored, so only the lookup and retrieval of the object will have different timing values after the initial storage. This is because not the object itself is stored but rather a pointer to the object and when you retrieve this you have a direct reference to the object and no longer need to access the Map with each operation on that object.
As I recall Jack Shirazi's java performance tuning book by O'Reilly discusses this and as the other poster said a synchronized HashMap is slightly slower than a HashTable, however the difference is minimal and I wouldn't be concerned about it for any code I've ever written. steve - http://www.jamonapi.com - a fast, free performance tuning, monitoring and scalability testing API.
Another diffreence is you can put null as key in HashMap but not in Hashtable HashMap hashMap=new HashMap(); hashMap.put(null,"NO ERROR");//Permissible But Hashtable hashtable=new Hashtable(); hashtable.put(null,"ERROR");//Error
I used the code below to check the speed difference between Hashtable and HashMap. I tried it on Solaris/Sparc and W2K/Pentium4 machines. On Solaris, as I expected, HashMap is faster (although only slightly). But on Windows, Hashtable is significantly faster! Can anyone try this on Linux & Solaris for x86?
I was skeptical of this, but I tried Yosi's program on Linux, and see the same thing. If you don't use a capacity when creating the objects, the performance gap becomes even wider -- almost a factor of 2! I've got OptimizeIt -- I'm going to try profiling this and see if I can figure out what's going on. OK, I did some checking with Optimizeit, and as often happens, the performance differences essentially vanish in that environment. I did do one very useful experiment. Note that the above program is not a very good test because the data doesn't arrive in random order. If you change the program to use a set of more random Strings, as shown below, and also move the time measurement around so it doesn't include the object allocation (which should actually magnify the differences) then two things happen. First of all, the performance for both Hashtable and HashMap increases a lot with the random data -- by a factor of two or so. Second, the performance difference all but vanishes (there's about a 3% difference, favoring Hashtable still.) Anyway, this is yet another reminder that microbenchmarks -- programs that pound on just one method and claim to be measuring some useful performance indicator -- are problematic.
[ October 27, 2003: Message edited by: Ernest Friedman-Hill ]
Like most performance discussions, this stuff isn't usualy worth worrying about unless you've profiled to determine where your bottlenecks really are - but some of us find the issues involved interesting anyway, so for them, here's more discussion: There was a related discussion back here. The last post where David quotes Josh Bloch is the key: the prroblem with the benchmark is that it looks at the special case of sequential Strings. The JDK 1.4 implementation of HashMap is a bit different from both the 1.3 HashMap and from Hashtable - 1.4 adds some code which basically scatters hashcodes more (pseudo)randomly within the buckets of the HashMap. This extra scattering is very beneficial for certain worst-case situations which were very slow in 1.3 (and Hashtable), but in some other situations it actually results in slower behaviour than in 1.3. Specifically, sequential Strings end up usually having sequential hash codes, which means there are very few collisions in a 1.3 HashMap for sequential Strings. But the additional hash scattering of 1.4 results in some extra collisions. Plus the scattering itself takes some computational time. However, the worst-case behavior of the 1.4 is much, much better than the worst-case behavior of the 1.3 code. Sequential cash code are at most maybe twice as bad on 1.4 as 1.3 - but it's still O(1) for each put() or get(). In comparison, there are cases in 1.3 where you get O(n) behavior for put() or get(). Here's a modified benchmark to try:
I switched from String to Integer because the hashing was simpler to understand (the int value is the hash). If the values are integer multiples of the map capacity, then the performance of Hashtable (or HashMap in 1.3) is truly terrible, while HashMap 1.4 does just fine. That is why they put the new scattering code in, even though in some cases it may make things a little slower than they were in 1.3.
Another thing this points out is that there's no such thing as the best hash table implementation. We really ought to have a couple dozen of 'em, with good docs discussing the subtle differences, and carefully pick the right implementation for a particular purpose, or at least have the option to do so if performance matters. If you've worked with hardware designers, you'll see how they have these thick catalogs of ICs (or used to... maybe it's all on the web these days). Algorithm implementations ought to be like that.
Guess I'm a johnny come lately to this discussion, but I have some interesting retorts to some of the performace topics that were discussed on this thread. Basically, I have an XML parser I've been trying to squeeze a couple milliseconds out of; so I've rolled my own stack and hashmap data structures. Both grow by powers of 2. The results I obtained were using a P4 @ 1.8GHz, J2SE 1.4.1, and J2EE 1.4.2. As suggested by Josh Bloch here I built my testing framework to run n timed runs, and take an average of all the runs. Also, I'm running the JVM with the -Xnoclassgc option. Also, I wrote 2 init functions, one to build a sequential list, and one to build a random list.
Accessing with a single thread, the synchronized keyword did not affect my performace.
Sequentially numbered keys are faster in all cases.
HashMap suffers a penalty of approximately 5% on random lists.
Hashtable suffers nearly a 40% penatly.
HashMap is approximately 5% faster than Hashtable on random lists.
Hashtable is about 35% faster than HashMap on sequential lists.
In all cases MY implemention was faster
32768 items, 30 runs, put(), remove():
If anyone's still interested, I could post the testing framework.
[ December 19, 2003: Message edited by: Shawn Curry ]