This week's book giveaway is in the Agile and Other Processes forum.
We're giving away four copies of Darcy DeClute's Scrum Master Certification Guide: The Definitive Resource for Passing the CSM and PSM Exams and have Darcy DeClute on-line!
See this thread for details.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Jeanne Boyarsky
  • Tim Cooke
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Mikalai Zaikin
  • Carey Brown
Bartenders:

Objects equal as per the overridden equals method to be in same hashbucket for fast retrieval?

 
Ranch Foreman
Posts: 2891
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As per equals - hashcode contract, if we override equals method then we should override hashcode method too. Is the reason for it that if two objects are to be equal as per the condition in the overridden equals method, then they should be stored in the same hasbucket for faster retrieval. ? thanks
 
Master Rancher
Posts: 4661
63
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, exactly.  Not just "for faster retrieval", but to be retrieved at all.  A HashMap doing a get(key) will calculate the hashCode() of the key and look in the appropriate bucket for that hash code.  If it doesn't find a matching key in that bucket, it's not going to look in any other buckets - it will simply treat the key as "not found".  So if the hashCode() is not consistent with equals(), you can put objects into a hashmap, and they may never be found again, even though they are present in the HashMap.
 
Marshal
Posts: 28009
94
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's approximately right, but it has nothing to do with "faster". If you have an object with hash code A in a hash set, and you search that set with an equal object with hash code B, then it's possible you won't find the object in the hash set at all.
 
Monica Shiralkar
Ranch Foreman
Posts: 2891
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks. Is each hashbucket exclusively  for objects which are equal as per the condition in equals method or contains some other objects too which are not equal as per the overridden equals method?
 
Mike Simmons
Master Rancher
Posts: 4661
63
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry - the first version of my reply just said "yes, exactly" - then I edited it to add the caveat.  But the time I'd finished, Paul had already posted that point.  So even though mine appears first in the thread, Paul actually made the point first.
 
Mike Simmons
Master Rancher
Posts: 4661
63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's not exclusive.  The bucket can contain other entries with the same hash code, but not equal - because that's allowed by the hashCode contract.  Furthermore, a given bucket is usually for a group of hash codes, e.g. all hash codes for which hash_code % some_modulus is the same value.  So there can certainly be other entries in the bucket.  But the goal is that usually, there's a relatively small number of entries in the same bucket.
 
Monica Shiralkar
Ranch Foreman
Posts: 2891
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:Yes, exactly.  Not just "for faster retrieval", but to be retrieved at all.  A HashMap doing a get(key) will calculate the hashCode() of the key and look in the appropriate bucket for that hash code.  If it doesn't find a matching key in that bucket, it's not going to look in any other buckets - it will simply treat the key as "not found".  So if the hashCode() is not consistent with equals(), you can put objects into a hashmap, and they may never be found again, even though they are present in the HashMap.



Thanks. So Hashing has nothing to do with faster performance?
 
Monica Shiralkar
Ranch Foreman
Posts: 2891
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:It's not exclusive.  The bucket can contain other entries with the same hash code, but not equal - because that's allowed by the hashCode contract.  Furthermore, a given bucket is usually for a group of hash codes, e.g. all hash codes for which hash_code % some_modulus is the same value.  So there can certainly be other entries in the bucket.  But the goal is that usually, there's a relatively small number of entries in the same bucket.



So a hashbucket can also have objects other than the ones which are equal as per the equals. But why are other objects allowed? Why not each hashbucket dedicated to objects which are equal as per the equals override condition?
 
Paul Clapham
Marshal
Posts: 28009
94
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Why not each hashbucket dedicated to objects which are equal as per the equals override condition?



Suppose you add 10,000 objects which are all not equal to each other. If each bucket contains only one object then you're going to create 10,000 buckets. So what you're doing there is essentially comparing each new object to the ones which are already there, which is something you don't need buckets for. This is probably one of the least effective methods.

So Hashing has nothing to do with faster performance?



The usual response to that is "Faster than what?" and yes. Hashing is faster than comparing each new item to all of the items which are already there.

However it's possible that the caller is repeatedly trying to add existing items to the hash table, for example distilling a large set of mostly duplicate items into a small set of all-different items. This is a different challenge. However people who design hash tables assume that they don't know anything about the objects which are going to be added and subsequently searched for, so their designs try to work well for all situations. Since people have been using hash tables for about 40 years now it's safe to assume that the hash tables you're using are well-designed for almost every situation. And yes, "faster" is one of the most important things those designers have focused on.
 
Marshal
Posts: 78698
374
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:. . . Hashing has nothing to do with faster performance?

Although that is probably not the primary intention of calculating a hash code, using a hash code does allow faster insertion and retrieval in hash‑based data structures. That applies particularly if you use immutable objects as keys and they cache their hash codes. Insertion might necessitate calculating a hash code, but subsequent use allows plain return of a field. Once the hash code is known, the bucket can be found very quickly by calculating h & c - 1, where h is the hash code, after rehashing if any, and c the capacity of the backing array.
 
Mike Simmons
Master Rancher
Posts: 4661
63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think that using hashCode() is done primarily for performance - it allows a HashMap to by much faster than searching a list or array one element at a time.  It's also usually faster than searching a tree or sorted list, though these are more comparable in speed.  My point, when I said "Not just for faster retrieval", was that faster retrieval is not the only reason you need to follow the contract of hashCode and equals.  If you violate the contract, your HashMap probably doesn't work at all.  But, speed is generally a primary benefit of using a HashMap or HashSet as opposed to some other structure.
 
Mike Simmons
Master Rancher
Posts: 4661
63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monika Shiralkar wrote:Why not each hashbucket dedicated to objects which are equal as per the equals override condition?


That's a nice goal, but it's difficult and often impossible to guarantee this - unless you know in advance all the values that will be put in the map or set.  (See Perfect hash function for more about that possibility.)  In general though, we use hash codes in places where we don't necessarily know in advance all the keys that will be added.  So we want a hashCode() that will probably, generally, produce "unique" (or at least, rarely repeated) values for each distinct key it's used for.  And consider something like a String, which contains many many more than 2 ^ 32 distinct values, so many more than an int hachCode() can possibly ever represent.  It's not possible to create a hashCode() for String that will always be unique for a particular sequence of characters.  That's true for many other classes as well.  So, sometimes different keys will end up in the same bucket, even though we try to avoid it.
 
Saloon Keeper
Posts: 27494
195
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My memory, as usual is hazy, but I'm pretty certain that the "hashCode()" method for an object does NOT generally determine the hash value used in a HashMap.

Consider: HashCode() returns one of approximately 4 billion values (did I count the bits right?). But when you construct a HashMap, if you don't specify an initial capacity, only 16 buckets are created. To hash additions into a 16-bucket table, you'd need a hashcode that can only return 0-15, not 0-4 billion.

So you have 2 hash codes. One that determines the bucket and the original hashCode whose virtue is that when searching a bucker overflow chain you can simply compare hashCode values and immediately discard ineligible ones (hasc codes unequal). For hashCode matches, you then need to use the equals() method to ensure it's not just a hash-only match and is in fact a true match.
 
Mike Simmons
Master Rancher
Posts: 4661
63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Correct - I alluded to this above with "Furthermore, a given bucket is usually for a group of hash codes, e.g. all hash codes for which hash_code % some_modulus is the same value."  With HashMap there's also a supplemental step:


The javadoc says it

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.


 
Tim Holloway
Saloon Keeper
Posts: 27494
195
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Traditionally, the best size for a hashtable (assuming you hadn't computed a Perfect Hash) was an odd, preferably prime number. But whatever works.
 
Saloon Keeper
Posts: 10555
83
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Traditionally, the best size for a hashtable (assuming you hadn't computed a Perfect Hash) was an odd, preferably prime number. But whatever works.


I believe that was true for hash tables that resolved hash value collisions by skipping through the array by some fixed prime number of hops till the first empty bucket was found. I implemented several of these based on a data structure and algorithm book many decades ago. It seems that the preferred approach these days is to have a table size that is a power of 2 and apply a bit mask applied  to the hash code to get the bucket number, then the bucket contains a list of all the  collisions which would hopefully be just one.
 
Try 100 things. 2 will work out, but you will never know in advance which 2. This tiny ad might be one:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic