GeeCON Prague 2014*
The moose likes Performance and the fly likes Hashtable vs Hashmap Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Performance
Bookmark "Hashtable vs Hashmap" Watch "Hashtable vs Hashmap" New topic
Author

Hashtable vs Hashmap

Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
I'm seeing some strange behavior, and I was wondering if anyone had any insights.
As a test, I looping through, inserting, and retrieving 10,000 elements into a Hashtable and through a HashMap. Comparing the speeds for eac,
I'm finding that the Hashtable activities are actually faster then the HashMap activities. Now, that doesn't make sense, because Hashtables are synchronized, and HashMaps are not. Has anyone run into this?
M, author
The Sun Certified Java Developer Exam with J2SE 1.4


Java Regular Expressions
Ron Newman
Ranch Hand

Joined: Jun 06, 2002
Posts: 1056
That is a surprising result. Could you post the code, so others can check it out on their JVMs and report back?


Ron Newman - SCJP 1.2 (100%, 7 August 2002)
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
That doesn't jive with my own test results as published here:
http://www.javaranch.com/newsletter/Aug2002/newsletteraug2002.jsp#collections


Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
I spoke to John Zukowski about this, and he suggested that it might have something to do with initial size or hotspot.
To that end, I'm running each test in it's own VM: I haven't addressed the initial size theory yet, I just wanted some feedback. I'm also using a TreeMap as a control, and it is functioning as expected.
These tests are on jdk 1.4.1, MS 2000 professional.
HashtableTest

HashMapTest

TreeMapTest

Thanks,
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
Also running on Win 2000 pro...
I'm getting the same results. The Hashtable always performs better than HashMap even at different trial sizes. Even stranger, adding an initial capacity and load factor makes the performamce get worse, not better.
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Hi Paul,
This is pretty weird: I'm inclined to think it's a 1.4 thing: I'm getting the expected results with 1.2. Are you also?
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
You might try running this program:

When I run this I get these results:

If you compare the 1.4.1 sourcecode for HashMap and Hashtable, certainly HashMap requires more calls to package accessible functions to do any given action -- but I don't have a copy of 1.2/1.3 source to look at.
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
1.4.1 on Win 2K pro:
total time for TreeMap: 422
total time for Hashtable: 156
total time for HashMap: 312
total time for Hashtable presized: 109
total time for HashMap presized: 172
Max Habibi
town drunk
( and author)
Sheriff

Joined: Jun 27, 2002
Posts: 4118
Originally posted by Thomas Paul:
1.4.1 on Win 2K pro:
total time for TreeMap: 422
total time for Hashtable: 156
total time for HashMap: 312
total time for Hashtable presized: 109
total time for HashMap presized: 172


Interesting. I got the following.
total time for TreeMap: 801
total time for Hashtable: 301
total time for HashMap: 1081
total time for Hashtable presized: 311
total time for HashMap presized: 250
I'm not really sure what to make of all of this: I simply copied in your code. It seems that in this case, my Hashtable is slower, while your Hashtable is faster.
Curiouser and Curiouser.
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
[ October 14, 2002: Message edited by: Max Habibi ]
Sri Addanki
Ranch Hand

Joined: Apr 27, 2001
Posts: 195
Hi Max,
Interesting. I got the following results, each time i compile and run the program:
total time for TreeMap: 901
total time for Hashtable: 340
total time for HashMap: 822
total time for Hashtable presized: 240
total time for HashMap presized: 290
total time for TreeMap: 881
total time for Hashtable: 331
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 290
total time for TreeMap: 872
total time for Hashtable: 320
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 301
total time for TreeMap: 881
total time for Hashtable: 321
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 291
Thanks,
sri
Sri Addanki
Ranch Hand

Joined: Apr 27, 2001
Posts: 195
Hi Max,
I'm sorry...this is in continuation of the above post...i forgot to tell, that i run the program on j2sdk1.4.1
thanks,
sri
Ron Newman
Ranch Hand

Joined: Jun 06, 2002
Posts: 1056
On MacOS X 10.2.1 with Java 1.3.1:
total time for TreeMap: 1772
total time for Hashtable: 425
total time for HashMap: 346
total time for Hashtable presized: 640
total time for HashMap presized: 274
sandeep menon
Greenhorn

Joined: Oct 16, 2002
Posts: 3
on win2k with 1.3.1
I think the differences might be due to different rehashing algorithms being used in Hashtable and HashMap.
Giving an initial capacity in my test significantly reduced the time and in all cases HashMap was a little faster.
Hashtable - no initial capacity - 220ms
Hashtable - initial capacity 150000 - 70ms
HashMap - no initial capacity - 210ms
HashMap - initial capacity 150000 - 60ms
These are average times over several runs.
Guy Allard
Ranch Hand

Joined: Nov 24, 2000
Posts: 776
Hi - A Linux box, PIII 600:
1.3.1
total time for TreeMap: 1882
total time for Hashtable: 445
total time for HashMap: 309
total time for Hashtable presized: 714
total time for HashMap presized: 174
1.4.0
total time for TreeMap: 1086
total time for Hashtable: 385
total time for HashMap: 920
total time for Hashtable presized: 212
total time for HashMap presized: 362
Guy
Ron Newman
Ranch Hand

Joined: Jun 06, 2002
Posts: 1056
Sure looks to me like Sun broke something on the way from 1.3.1 to 1.4 .
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
It seems an pretty old bug report on the topic was ignored:
http://developer.java.sun.com/developer/bugParade/bugs/4491461.html
Sayed Ibrahim Hashimi
Ranch Hand

Joined: May 17, 2001
Posts: 148
WinXP P4 2.26 1.4.0
total time for TreeMap: 313
total time for Hashtable: 94
total time for HashMap: 234
total time for Hashtable presized: 47
total time for HashMap presized: 94


SCJP 1.4<br /><a href="http://www.cise.ufl.edu/~sih" target="_blank" rel="nofollow">www.cise.ufl.edu/~sih</a>
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
It seems an pretty old bug report on the topic was ignored:
http://developer.java.sun.com/developer/bugParade/bugs/4491461.html

It's hard for us to see how relevant that example really is, since we can't see the test code used. The claim that the case is unduly weighted to iteration could be valid - HashMap and Hashtable aren't primarily used for iteration anyway, so a claim that iteration is too slow can be justly ignored if the new code gives better performance elsewhere. Unforturnately the test run here seem to indicate this is not the case - but Sun may not realize it yet.
Another (perhaps more relevant) bug report is at:
http://developer.java.sun.com/developer/bugParade/bugs/4669519.html
The discussion there is interesting, but it sounds like they believe the problem has been solved as of 1.4.1. Apparently that's not entirely true. Someone may wish to submit a new bug report with the test code - I believe this is more likely to be read and addressed than are comments appended to an already-closed bug report. Of course, considering the news about layoffs at Sun, this may not be their highest priority right now; dunno.
[ October 19, 2002: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
I submitted a bug report already, although I was a bit hasty and didn't say everything I should have (like why HashMap is expected to be faster than Hashtable in the first place). And all I mentioned was comparative speed. But if you all submit duplicates, maybe they'll get the message .
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I saw your message appended to the old report, but didn't find a new report from you. Maybe it's still being processed. When it's visible on the system, post a link here and we can all vote for it.
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
They say it'll take up to three weeks to process.
JohnDoe DoeJohn
Greenhorn

Joined: Nov 02, 2002
Posts: 5
Just a thought, you could use the same one file to do both tests, by commenting out one or the other but using a Map instead:

Always try to use the generic interface where possible, such as using List instead of Vector, ArrayList or LinkedList. In some cases, some api methods require Vector, for example, so you have to use a Vector. You can still use a List then just typecast to a Vector to maintain the flexibility of being able to use ArrayList, Vector or LinkedList "should" the API later change to properly accept the generic interface.
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
For performance testing it's best not to use the interfaces because you add a performance hit for the polymorphic behavior.
Ron Newman
Ranch Hand

Joined: Jun 06, 2002
Posts: 1056
But you're going to take that hit anyway, because neither HashMap nor Hashtable is final.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Ron Newman:
But you're going to take that hit anyway, because neither HashMap nor Hashtable is final.

The modern Hotspot Engine is intelligent enough to see that there are no subclasses of HashMap/Hashtable loaded, so that it doesn't have to take polymorphism into account. Cool, eh?


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
Originally posted by David Weitzman:
They say it'll take up to three weeks to process.

Still no word.
Tom Hughes
Ranch Hand

Joined: Feb 09, 2002
Posts: 86
Someone once told me that hashmap changed from using an open hashing (1.3) to a closed hashing strategy (1.4) - or maybe vice versa, my memory is a bit hazy !
Could this explain the change in performance ?
T.
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
Hmmm. I think I've lost a link I had earlier. Anway, check out http://www.javaspecialists.co.za/archive/Issue054.html and http://www.javaspecialists.co.za/archive/Issue054b.html
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Very informative; thanks for posting that. So if I understand correctly:
1. For some reason, some performance optimizations for doing / and % with bit-shifting and masking were lost in later JDKs. This made / and % slower than they had been.
2. In response they made hashtable bucket sizes powers of 2, which makes it really easy to perform % by masking, which is very fast.
3. Unfortunately if the bucket size is a power of two, you can get really crappy performance if the hashCode() method doesn't have a really good distribution for the last few bits.
4. It turns out many existing classes had inferior hashCode() methods in this respect.
5. So to protect against really abysmal performance in many common worst-case scenarios, they wrote a rehashing function which imposes a slight performance against all hashCode(). So now the best hashCode() methods are a little slower, but the worst hashCode() methods are only a little slower, rather than a LOT slower.
Which I guess makes some sense, in a twisted sort of way. But it seems as though if they just fix the damn % operation in the first place, they could go back to the earlier code, right? Wouldn't that be preferable? Would the rehashing method be offering any significant benefit if the bucket size weren't a power of two? Any ideas?
BTW David, I still couldn't find your bug report. Did you ever hear anything more about that? I have a free vote set aside for you.
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
But it seems as though if they just fix the damn % operation in the first place, they could go back to the earlier code, right? Wouldn't that be preferable?
Maybe we're both missing something, but I also think that rewriting HashMap is a pretty strange way to fix a mistake in the JVM.
David, I still couldn't find your bug report. Did you ever hear anything more about that?
Nope. Not a word.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Maybe someone is trying to bury issues that embarrass them.
I suppose we could write to Josh Bloch and try to pester him about it, but that might distract him from making sure all our favorite new features make it into 1.5. Don't want that to happen.
[ February 20, 2003: Message edited by: Jim Yingst ]
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
By any chance did you ignore the second link I posted? (So did I !). It seems that the % problem is only for constant values.
That would suggest that better worst-case performance at the cost of average-case performance may be the only reason for the change in algorithm.
[ February 20, 2003: Message edited by: David Weitzman ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Hey, I read it. You're right that the division with constants problem may in fact be irrelevant here. When I first read it I was imagining that a JIT might consider the infrequently-changing bucket size to be "close enough" to being constant, that it was able to perform the optimization anyway. But now as I consider this further, it seems less likely. I know a JIT pays attention to which sections of code are being run frequently, to figure out which should be optimized; I have a harder time imagining the JIT also paying attention to the value of frequently-used variables within that code, and deciding whether they are sufficiently "close" to a constant to optimize (while also inserting checks to see when the "constant" changes and thus must be re-optimized. I can sorta imagine ways this could be done, but it seems like a lot of work to go to for dubious results; I doubt it's something that was in earlier JDKs and subsequently dropped. Rather it seems like something that would've been avoided in earlier versions until they'd been able spent enough time on it to verify that it produed useful results (if indeed it's even possible) - then it would be included in the JDK and not subsequently dropped. Of course, this is based on my vast experience in working with Sun's JIT engineers :roll: . (I.e. I'm pretty much just making this up as I go, but why let that stop me?)
My only other reason for thinking these optimizations were connected to the HashMap issue is that from the articles, it seems as though Josh Bloch thought there might be a connection. And even though the reasoning given in the article doesn't seem very convincing, Josh could well know more about it than was covered in the brief conversation recounted in the article. Or maybe not, of course.
Joe Cheng
Greenhorn

Joined: Feb 23, 2003
Posts: 11
Hey everyone, be careful to remember that garbage collected runtimes are highly non-deterministic--your timing runs could be highly affected by GC kicking in. At the very least I think each timed operation should be preceded with System.gc() and Thread.sleep(10000).
Also as others have mentioned, Hotspot will make the early runs look bad, so put the whole body of the main method in a for loop and wait for the numbers to stabilize.
And set sufficiently large -Xmx and -Xms so that the heap doesn't need to grow during your test.
Doing this using the code from this thread, I still see HashMap being slower than Hashtable (that's in Win32 1.4.0_01; in 1.3.1_02 they're neck and neck) but only by about 10%, and it's relatively consistent. The numbers stabilize after the second or third iteration.
If you're seeing 100% or 1000% differences in the performance, I highly suspect Java's memory management is throwing a wrench in your numbers... try running with -verbose:gc enabled and see what happens.
David Weitzman
Ranch Hand

Joined: Jul 27, 2001
Posts: 1365
I wrote (I forget when):

You're probably aware of this, but Hashtable seems to outperform HashMap (at least for the general case) in the Java 1.4 and 1.4.1. There has been a little bit of confusion over at the JavaRanch Performance forum where we've been discussing the issue.

Joshua Bloch wrote (Feb 24):

Hi. This is the first I've heard of this. I looked into it briefly and I believe I know what's going on. I don't have time to write it up tonight, but I'll try to do it soon. For starters, the microbenchmark is not good. It should time the test repeatedly so that you can make sure you're getting a repeatable result. Timing the first run gives you startup transients, which are not what you want. Also you have to figure out how you want to deal with garbage collection: either eliminate it from the timings by calling System.gc() outside of the timing loop, or time runs long enough to predictably include GC.
That said, Hashtable really is faster than HashMap for the data set in this benchmark: strings formed from sequential numbers. The pre 1.4 HashMap implementation, which was retained in Hashtable, does very well on sequential keys because it doesn't scatter them at all: it puts them in sequential buckets. As a result, there are *no* collisions. The price you pay is that certain regular but non-consecutive sequences will all hash to the same bucket, resulting in horrible performance. The 1.4 version (and especially the 1.4.1 version) scatters the keys pseudo-randomly, resulting in occasional collisions when hashing consecutive sequences like the one in the microbenchmark. (The mathematics of this is pretty simple; you can easily calculate roughly how many collisions you'll get.) To test this conjecture, use pseudo-random numbers in the keys instead of consecutive. I did, and found that HashMap is a bit faster than Hashtable, as expected.
Loosely speaking, then, the 1.4.1 implementation is a better general purpose implementation. If you know that you'll be dealing with a consecutive sequence, the 1.3 implementation (still present in Hashtable) could well be faster. On the other hand, for consecutive keys you could just use an array (or List), which would be *really* fast : )
I hope this brief letter was at all comprehensible.

From the "I don't have time to write it up tonight, but I'll try to do it soon" I got the imporession that he would write a better microbenchmark or something in the next day or two and post it here, so I held off responding. I may have missinterpretted, but either way I wish I'd sent a "thanks" note in a timely manner.
 
GeeCON Prague 2014
 
subject: Hashtable vs Hashmap