Mike Simmons wrote:Yes, exactly. Not just "for faster retrieval", but to be retrieved at all. A HashMap doing a get(key) will calculate the hashCode() of the key and look in the appropriate bucket for that hash code. If it doesn't find a matching key in that bucket, it's not going to look in any other buckets - it will simply treat the key as "not found". So if the hashCode() is not consistent with equals(), you can put objects into a hashmap, and they may never be found again, even though they are present in the HashMap.
Mike Simmons wrote:It's not exclusive. The bucket can contain other entries with the same hash code, but not equal - because that's allowed by the hashCode contract. Furthermore, a given bucket is usually for a group of hash codes, e.g. all hash codes for which hash_code % some_modulus is the same value. So there can certainly be other entries in the bucket. But the goal is that usually, there's a relatively small number of entries in the same bucket.
Monica Shiralkar wrote:Why not each hashbucket dedicated to objects which are equal as per the equals override condition?
So Hashing has nothing to do with faster performance?
Although that is probably not the primary intention of calculating a hash code, using a hash code does allow faster insertion and retrieval in hash‑based data structures. That applies particularly if you use immutable objects as keys and they cache their hash codes. Insertion might necessitate calculating a hash code, but subsequent use allows plain return of a field. Once the hash code is known, the bucket can be found very quickly by calculating h & c - 1, where h is the hash code, after rehashing if any, and c the capacity of the backing array.
Monica Shiralkar wrote:. . . Hashing has nothing to do with faster performance?
Monika Shiralkar wrote:Why not each hashbucket dedicated to objects which are equal as per the equals override condition?
Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.
Tim Holloway wrote:Traditionally, the best size for a hashtable (assuming you hadn't computed a Perfect Hash) was an odd, preferably prime number. But whatever works.