Hi Everybody, i have a doubt regarding java.util.hashmap performance. if i add more entries into hashmap will there be any performance in fetching the values ? if yes, what is the maximum entry that we can in hashmap. Expecting answer from somebody. Thanks Micheal
Given a good hash function a hashmap will do inserts and lookups in constant time no matter how many objects are in the map. The hashmap will also resize itself, so the maximum size is limited by memory. [ January 22, 2003: Message edited by: Jon Strayer ]
posted 17 years ago
Hi Jon, Good hash function ? How do we know what java.util.HashMap internally uses. Please describe it in more detail. Micheal
The answer to your question depends on the scenario. A hashing algorithm, usually some form of modulus is designed to produce (as much as is reasonably possible) a consistant algorithm for finding/assigning a position within the hash table based on an input factor (in this case the Object's hashCode). The whole point of a hash table is to provide relatively consistant performance for accessing data. It does this by repeating the same hashing algorithm to produce a position within the internal array, as it original did to put the object in the array in the first place. So in the best case scenario, a hash table should not drop in performance based on it's size. But there are a few factors one must consider on top of this. No hashing algorithm is perfect, sometimes two different hashcodes will 'hash' to the same number (which represents the elements position in the array), this is called a hash code collision. When this happens, a hash table essentially will create a linked list at that position in the array, attaching the new element to the exisiting elements tail. So here you have two elements at the same position. The implication this has, is that when that when that Object gets requested, the hash table will find multiple objects in the same position and will need to do an node by node comparison to determine the proper object to be returned. The problem of collisions in hash tables is generally insignificant for the most part. They usually do not represent any major peformance concern. But as the previous poster said, a better hashing algorithm will yield a better result by avoiding collisions. Another issue that will affect performance, is the concept of growing the hash table. java.lang.Hashtable and java.lang.HashMap both sport an automatic 'growing' feature. This means you can keep filling it with data to your hearts content. But this comes with a price, when the hash table grows, it creates a new internal array (of larger size) and then rehashes all the elements from the old array into the new array. It needs to do this because one of the factors in the hashing algorithm is the size of the internal array. Avoiding this problem needs to come down to careful planning. If you know that your HashMap will only contain say, 20 elements, then you would be prudent to initialize your HashMap like this: Map myMap = new HashMap(20, 1.0f); This will cause the HashMap to create an array internally with room for 20 elements. The second parameter, is the load factor of the array. In this case I have chosen to make the load factor 100%. This means that the HashMap will not grow unless it exceeds 100% capacity. The default setting in all the resizable collections classes in the JDK is 70%. Which means, that even if you peform a 'new HashMap(20)', it will still grow automatically when it hits 14 elements. But to reiterate, if you have a HashMap (hash code collisions aside) the access time associated with accessing an object from a table of 50 elements versus a table of 400 elements is unchanged. Hence the point of hashing, to provide consistant performance. Regards, Mike.
posted 17 years ago
Hi Mike, It was a nice explanation.Thanks. so irrespective of number of fetches ( say in a jsp i fetch for different keys in the same hastable for 30-40 times.) the performance should not be affected. Please let me know if i am wrong. Micheal
Michael - no, the time required for a single fetch is (on avarage) constant, regardless of the number of elements in the Map. However 100 fetches will take approximately 100 times as long as a single fetch. Also, from Mike's post:
If you know that your HashMap will only contain say, 20 elements, then you would be prudent to initialize your HashMap like this: Map myMap = new HashMap(20, 1.0f); This will cause the HashMap to create an array internally with room for 20 elements
This is a bit misleading. I would usually want to set the capacity a bit higher than the maximum number of elements expected in the HashMap, because this will reduce the number of collisions in the map. Imagine it this way - if the capacity is set to 20 (meaning the internal array has size 20), and you insert 20 elements, each element will be assigned a position in the array according to the element's (hashCode() % 20) - which (for a good hash function) is like a random number. (Unless two elements are equal - assume they're not.) Now there's a chance that all 20 elements will end up in completely different array positions. But much more likely, some will end up in the same position. This is called a collision. Certainly if a 21st element were added, we'd guarantee that at least one collision would occur. But even for 20, it's exptremely likely that at several elements will by chance end up sharing array positions. The hashtable will then store a linked list at that array position, which contains all the elements sharing that array position. Whenever someone needs to access one of these elements, the system will loop through the linked list to find the right element. This adds to the HashMap's exectution time. Thus, collisions can be resolved, but we try to avoid them. If the expected number of elements to be stored in the HashMap is 20, setting the capacity to something like 40 will greatly reduce the number of collisions, and thus improve performance. It's a classic memory vs. speed tradeoff of course - but 20 elements is pretty insignificant on most machines, so I'd use the extra memory here without a second thought.
The HashMap uses the number generated by the hashcode() method. So if you override the equals method then you must override the hashcode() method otherwise you could have equal objects that are not recognized as equal by the hashmap. The default behavior is to generate a hash code based on the address which works with the default behavior of the equals method.
Originally posted by Thomas Paul: The HashMap uses the number generated by the hashcode() method. So if you override the equals method then you must override the hashcode() method otherwise you could have equal objects that are not recognized as equal by the hashmap. The default behavior is to generate a hash code based on the address which works with the default behavior of the equals method.
You are correct that the Object hashCode is used by the HashMap, but the hashCode itself is run through the hashing algorithm to produce an array position. Also, Jim was correct to point out my 'misleading' advice. A larger hash table will always yield less hash code collisions. Thanks for pointing that out Jim. Mike.
Well don't expect me to do the dishes! This ad has been cleaned for your convenience:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop