• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

Overriding equals(Object) and hashCode()

 
Greenhorn
Posts: 19
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi!
I read that to implement equals method and hashCode method we have to use the "same set of fields". If we have for example:

...and we use name and age to override equals, and we use just name to override hashCode...Why is wrong that? and Why we should use the same fields in both methods?
Thank you for the help!
 
Saloon Keeper
Posts: 12807
278
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you put a bunch of people in a HashSet, all people with the same name will go into the same bucket, even if they have different ages. So when you ask the set "Do you contain a "John" with age 48", the set will possibly first iterate over "John" with age 12, "John" with age 33 and "John" with age 84 before it can give you an answer.

In short, it's not *terrible*, but it will limit the performance benefits you get from collections that depend on the hash code, like HashSet and HashMap.
 
Javier Gonzalez
Greenhorn
Posts: 19
1
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you very much Stephan! I got it! I couldn´t find the answer in several websites!
Thank you!
 
Marshal
Posts: 72409
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you do it the other way round, however, with name only in equals() and both fields in hashCode(), you will break the general contract for hashCode() and your hash‑based collections may not work correctly.
 
Javier Gonzalez
Greenhorn
Posts: 19
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Campbell! So the contract will break because we use more variable fields in the hashCode() than in the equals()? but not when we use both in equals() and just one in hashCode()?
 
Saloon Keeper
Posts: 23409
159
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Javier Gonzalez wrote:Hi Campbell! So the contract will break because we use more variable fields in the hashCode() than in the equals()? but not when we use both in equals() and just one in hashCode()?



It's valid for two objects to have the same hashCode but not be "equals". It is not valid for two objects to be equals but have different hashCodes.

Ideally, no two objects won't have the same hashCode, since that can make hash-based searches to be less efficient. But the virtues of a hash code are it's shorter/quicker to compare than an item-by-item test. The price for that being that since it's a trimmed-down value, there is the potential for two non-identical objects to hash to the same value, since the hashing process discards information.
 
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:
Ideally, no two objects won't have the same hashCode, since that can make hash-based searches to be less efficient.



But isn't this happening all the time in the common scenario where say class has name and age, and both these are in the overridden equals method , and hashcode is also written and we pass object of such class while retieving from the hashset/hashmap.?
 
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not being a grammar purist, I think this was a typo:

Ideally, no two objects won't have the same hashCode, since that can make hash-based searches to be less efficient.



In standard English, we hope no two objects will have the same hashCode(), yes?
 
Campbell Ritchie
Marshal
Posts: 72409
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:. . . Ideally, no two objects won't have the same hashCode . . .

. . . but that ideal is often impossible to achieve, especially if the range of different kinds of object from one class has a cardinality > 2³².
 
Tim Holloway
Saloon Keeper
Posts: 23409
159
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:
In standard English, we hope no two objects will have the same hashCode(), yes?



Hope, yes. But as Campbell says, "ideal" is often impossible. Hash codes are by definition approximations.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are the below correct?

1) No 2 objects would have the same hashcode (unless, not only do we override hashcode and equals but also hard code the hashcode , which is a bad practice).
2) If we override the hashcode and equals, then it will not have same hashcode but one thing is for sure that these objects will land in the same bucket.
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Alright, maybe I was being a grammar purist, but people are trying to remember rules for long-term use.  I for sure am.

We wish that all distinct objects might wind up with distinct hashCode() values.
That may or may not actually happen, if it does, terrific, if not, well, the collections code will go ahead and call .equals()

The double negative arising from won't ("We don't need no stinking badges") adds extra emphasis in many languages, but instead they cancel each other out in standard English [but not in Pink  Floyd songs].  Like "not indistinct" = "distinct".

we hope no two distinct objects will have the same hashCode() value, / we hope no two distinct objects won't have *different* hashCode() values.

That would be best.  The .hashCode() .equals() contract requires that no two objects that return true for .equals() will return different .hashCode() values

Or: The contract requires that objects that return true for .equals() comparison won't yield different .hashCode() values.

I've also noted that crazy things can happen if you use a custom Comparator<> that looks at more fields than your classes override for .equals() -- a TreeSet or TreeMap could violate non-duplicate rules in terms of .equals()  if you do that, because they use < and > from Comparator to place and to locate things, but the Set no-duplicates rule is in terms of .equals().
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Are the below correct?

1) No 2 objects would have the same hashcode (unless, not only do we override hashcode and equals but also hard code the hashcode , which is a bad practice).
2) If we override the hashcode and equals, then it will not have same hashcode but one thing is for sure that these objects will land in the same bucket.



1) No, there can not only be multiple things landing in the same bucket (which can be reduced by increasing the default capacity and decreasing the load factor) but the values returned for .hashCode() can also collide and give identical values, tho hopefully much less frequently.

2) Also no, distinct objects will rarely have identical .hashCode() values, but it will eventually happen, sooner if we don't pick a good .hashCode() implementation for the class.  Even more commonly, will they land in the same bucket.

How often the .hashCode() values will be the same for distinct objects can be interpreted as a quality of your class data and the implementation of .hashCode() that you write.  Some people are very fussy about trying to get them just right to minimize this, I don't know when it is premature optimization and when it is just good coding.

How often distinct values of .hashCode() will land you in the same bucket has to do with the parameters of the HashSet or HashMap you create, think about the effect of each of the optional parameters in the constructor:

HashSet​(int initialCapacity, float loadFactor)

One way to interpret the question "Do you know how a HashSet or HashMap works?" is to answer whether you understand how those two values will help you avoid overloading buckets with distinct objects with distinct hashCode() values at the cost of throwing extra memory at the problem.

Lastly, while in the old days lots of bucket collisions in large hash-based data structures might be bad for performance, all modern versions of Java switch to a tree to represent a bucket if more than 7 items land in that bucket.  I think the cost of re-hashing when you hit the LoadFactor would still be increased versus just having a wonderful distribution of distinct .hashCode values.
 
Marshal
Posts: 26458
81
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:2) Also no, distinct objects will rarely have identical .hashCode() values, but it will eventually happen, sooner if we don't pick a good .hashCode() implementation for the class.



Here's a challenge for you: How many different String values can there be? A String object can have any length from 0 to Integer.MAX_VALUE and the character objects inside a String can be any value between 0 and 2^16-1. You'll find the number is beyond astronomical. And yet all of those different String objects have hash codes between +/- 2^32. If you play around a bit, it isn't hard to find short Strings which are different but have the same hash code.

And it's quite common to use String values as part of the equals/hashCode calculations in Java classes. So the hope of different objects having different hash codes is very small; just do your best with the part of the calculation which doesn't involve Strings because that's out of your control.
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There's at least four things we can do about the fact that many strings entails many hashCode() of String collisions:

1. As you point out, work to ensure that the contributions of the other fields included in the calculation often result in distinct .hashCode() values
2. Play with the capacity and loadFactor to minimize the impact of collisions in terms of lookup time by giving away some memory.  <-- doesn't help if the hashCodes are literally identical....
3. Don't stress so much, unless we have bad luck and huge hash structures we shouldn't have enough collisions to cause bad performance
4. I forgot what I was thinking because you made me just realize something else that didn't make sense to me 'til now...

I think I understand what some of the programmers who are OCD about .hashCode() implementations are saying right now, in terms of where they get literal bits of difference.
They are talking about how often you will have two different .hashCode() values but wind up in the same bucket anyway.  You want to make sure the lower order bits have a good distribution to minimize this.

I will presume all of the hashCode obsessed have need of very performant hash structures that are crammed full with tons of data or are just perfectionists, but now I see what they were getting at.
Sure, these two .hashCode() values were distinct, but where were the bits of difference?  Do the hashCodes get further shifted or messed with before being used to choose a bucket?  If not, I want as many distinct bit patterns appearing in the lower bits, since I am unlikely to have millions of buckets...
 
Paul Clapham
Marshal
Posts: 26458
81
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:I think I understand what some of the programmers who are OCD about .hashCode() implementations are saying right now, in terms of where they get literal bits of difference.
They are talking about how often you will have two different .hashCode() values but wind up in the same bucket anyway.  You want to make sure the lower order bits have a good distribution to minimize this.



Or you could use an odd number of buckets. Certainly if your number of buckets is a power of two then that puts a huge weight on the lower-order bits. I don't know whether making the odd number N be prime would improve the step where taking a "random" number modulo N produces an even distribution over the range from 0 to N-1, a bit of number theory would help there. I'm sure that a lot of people have already considered that over the last 50 years.
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Paul!

I can summarize some of my general problems thusly.

I have already decided that my long-haul Java education will include the Great Books like "Effective Java" but I am going for certification, and hopefully a new job, sooner than I will get to those.

But there's a lot of low-quality material out there that I now avoid consulting because it is only partially correct, the bar seems to be low for "I guess I know enough to do a tutorial page or video".

There is some great-quality stuff out there that I have been consulting, that quotes Chapter and Verse from the "Great Books" that they remind you that you should eventually read.

So my focus is on learning both what is on the Cert Exams and other things that will cause you to shoot yourself in the foot and wind up with broken code if you ignore.

Regarding the stated topic of this thread and stuff I've posted in t, I would say that you'd be surprised how many people employed with a day job as Java Professionals ignore or forget parts of that, except you have much more exposure to confused people than most do, so probably not.

It is a tightrope act gathering knowledge from sources that are rarely wrong, without going way too deep into things like writing really great .hashCode() and .equals() overrides, rather than just not writing broken code.

As a result I am aware of most areas where there is more to be learned, but can't always apply those in practice.

I think I have now done this particular topic to death, with the exception of writing truly great .hashCode() overrides, and .equals() behavior in the face of sub-classes that add new data elements, which goes beyond where I am trying to get for now.

Would you agree that understanding the overloaded constructors for HashMap and HashSet are probably the next level up beyond understanding the .hashCode()/.equals()/.compareTo()/.compare() contract?  I believe a superficial understanding of them is in scope for the exams I am currently targetting.
 
Campbell Ritchie
Marshal
Posts: 72409
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Are the below correct?

1) No 2 objects would have the same hashcode (unless, not only do we override hashcode and equals but also hard code the hashcode , which is a bad practice).

Incorrect. We have already said it is desrirable for two unequal objects to have different hash codes, but that cannot always be achieved.

2) If we override the hashcode and equals, then it will not have same hashcode but one thing is for sure that these objects will land in the same bucket.

Not sure I understand that, but it feels incorrect, too.
 
Campbell Ritchie
Marshal
Posts: 72409
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:. . . Like "not indistinct" = "distinct". . . .

Too tired to go into the grammar, but such double negatives aren't quite exact equivalents to the affirmative.

. . . Would you agree that understanding the overloaded constructors for HashMap and HashSet are probably the next level up beyond understanding the .hashCode()/.equals()/.compareTo()/.compare() contract? . . .

No. The constructors don't mention equals() nor hashCode(). Similarly for HashSet, I would have thought those constructors are easy enough to learn, and don't know whether they are in the exam syllabus.
As you said in your previous post, the principal factor for good performance in a hash‑based data structure is an even distribution of hash codes from its keys or elements. That depends on the implementation of the keys, not the data structure. If your array has sizes in powers of two, then the low‑order bits are particularity useful for choosing a bucket. You can sort that out by using prime‑number bucket counts, but the % operator would be needed and its execution is slower than bitwise operators.

. . . a TreeSet or TreeMap could violate non-duplicate rules in terms of .equals() . . . the Set no-duplicates rule is in terms of .equals().

The documentation for TreeSet specifically tightens the general contract for Set by insisting the comparison technique be “consistent with equals,” which meansSo you are using compare() (or compareTo()) as a surrogate for equals().

. . . there's a lot of low-quality material out there . . .

Isn't there just.
 
Tim Holloway
Saloon Keeper
Posts: 23409
159
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In Java, hashCode can potentially return any of approximately 4.4 billion values. A good hashCode has an equal probability of returning any of those potential values for a given object instance. A poor hashCode will favor certain return values over others.

The agorithm you use to compute hashCode is entirely up to you both in parts and in sum. If you don't like a java.lang.String's hashCode, you can compute the string-related part of the hash according to whatever algorithm feels good. String itself, of course, is a final class, so you cannot override the hashCode method for java.lang.String itself.

There is an art and a science to computing hash values so it's best to be well-researched on the subject - AND to test the algorithm rigorously.  As is the case with "genius" security, stuff that looks clever to its designer is often pretty weak.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Incorrect. We have already said it is desrirable for two unequal objects to have different hash codes, but that cannot always be achieved



But how come  it being 'desirable for two unequal objects to have different hash codes but not always being achieved'  means that the below statement of mine is incorrect?

"No 2 objects would have the same hashcode (unless, not only do we override hashcode and equals but also hard code the hashcode , which is a bad practice"

Not sure I understand that, but it feels incorrect, too.



I meant that if we override the hashcode and the equals method in a class, then all objects of this class will be in the same hash bucket if stored in hashset/hashmap?



 
Stephan van Hulst
Saloon Keeper
Posts: 12807
278
  • Likes 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:But how come  it being 'desirable for two unequal objects to have different hash codes but not always being achieved'  means that the below statement of mine is incorrect?

"No 2 objects would have the same hashcode (unless, not only do we override hashcode and equals but also hard code the hashcode , which is a bad practice"


You say that no two objects would have the same hash code. That's impossible because there is an almost infinite number of distinct objects, but a limited number of distinct hash codes.

I meant that if we override the hashcode and the equals method in a class, then all objects of this class will be in the same hash bucket if stored in hashset/hashmap?


Of course not. What would be the point if all objects ended up in the same bucket? Objects with different hash codes are preferably distributed uniformly over different buckets.

Objects with the same hash code will always end in the same bucket. So do your best to give different objects different hash codes.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks

Of course not. What would be the point if all objects ended up in the same bucket? Objects with different hash codes are preferably distributed uniformly over different buckets.

Objects with the same hash code will always end in the same bucket. So do your best to give different objects different hash codes.



Right. Understood.


Stephan van Hulst wrote:
You say that no two objects would have the same hash code. That's impossible because there is an almost infinite number of distinct objects, but a limited number of distinct hash codes.
.



Understood that what I said was wrong as hashcode is not a unique number.
And had that been a unique number all objects of this class would have landed in the same bucket.
 
Stephan van Hulst
Saloon Keeper
Posts: 12807
278
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No no no.

If each object returned a unique (== distinct) number, they would be distributed over different buckets fairly uniformly (but not necessarily).

It's really not that hard: Objects with the same hash code end up in the same bucket. Other than that, there are no guarantees.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks

Stephan van Hulst wrote:Objects with the same hash code end up in the same bucket. Other than that, there are no guarantees.



So if "Objects with the same hash code end up in the same bucket" then why would they be in different bucket in case of same hard coded hashcode?
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Monica:

The contract promises only that you will never have two "identical" objects in terms of .equals() that yield different hashCode() values.

If you ever did that, you don't have a poorly performing data structure, it would be totally broken, as you could stuff multiple copies of the "same" value into the same supposedly unique structure.

That is an absolute minimum, which would be legally satisfied by hard-coding the same value for everything to be returned by your .hashCode().

If you did that, you could never get a "broken" structure accidentally containing duplicates, but you would get very poor performance because every item would get stuffed into the same bucket, losing the super-fast lookup times to find things.

As long as that is satisfied, and it must be true always (you NEVER have two .equals() items returning different .hashCode() values, the absolute minimum requirement):
the less often that your hashCode() returns the same value for different objects, the better performance you will be getting out of your data structure.

People who are very smart, or who think they are, will be trying to get just the right .hashCode() implementation to do this.
They seem to not be satisfied with the ones auto-generated for them by their IDE (Eclipse, NetBeans, etc.) because they think they can do even better.

I haven't seen any suggestions that the auto-generated ones won't work, they will follow the .hashCode() / .equals() contract, but you can ensure that by checking that your .hashCode() is not based on any MORE data elements than your .equals() for your class.  If it does, you can get two DIFFERENT .hashCode() values for two items that would compare  true for .equals().

100% of the people who are trying to fine-tune their .hashCode() functions for good performance hopefully already "know" such work is necessary for their use from data analysis (seeing too many collisions and not good enough distribution amongst different buckets) and are keenly aware of how the HashMap and HashSet insert and maintain the data in the buckets.

It might have been premature to watch a video or read a full description of "How these work internally" when you first asked the question, but to see why some people are obsessed with writing "great" .hashCode() implementations for their classes, and not just accepting legal but sub-optimal ones, you definitely want to watch or read one of those full explanations.

On the other hand, if you already know that "To fulfill the .hashCode() .equals() contract, I will ensure I never forget to also override .hashCode() when I am writing the .equals() for my class, and also never overriding it to consider more data in my object than the .equals() does when I do so" you are getting into the dicey world of trying to improve upon good, valid, safe and legal in quest of great performance.

Additionally to all this, however, is the statistical fact that if you are doing technical interviews, the question of "explain to me how HashMap<> and HashSet<> work internally is a common technical interview question to see if you really understand things well.

Awareness of that fact drives people who are not even sure of relatively simple things more basic than this, to ask questions in this area.

Until quite recently, the .hashCode()/.equals() contract was in scope for the OCJP exam.
To my knowledge, the quest to try to write .hashCode() overrides giving Truly Excellent performance for their structures was always in the realm of people trying to always go for the best, at least in places real analysis of their program's performance seemed to merit it, like the people who are comparing the performance of other minor changes in their code with real data under real conditions, or people who just can't resist the temptation to always do things better.

I have been reminded by many posts on this forum that this can often be counterproductive if one knows less than they think they do about real data at runtime.

I think a good minimum to know to be able to do, if I show you a class that now has overridden hashCode() and equals() and tell you it is being used as a key in a HashMap or an element in a HashSet, is to be able to see if someone has accidentally violated the contract and broken the use of the class in these.

I brought up the fancier things, because by the time you get to those people who are giving instruction and advice in the wild, a lot of them that correctly tell you what you need to do to be correct and safe start showing you how you "can be even better".

That is why I asked if the next level beyond just meeting the contract meant understanding how to make good use of the overloaded constructors specifying initial capacity and loadFactor.

You NEVER need to change those to have a valid working data structure, but tweakers and performance junkies will be fond of suggesting you use them to get better performance for large and heavily accessed hash-based data structures.  They basically trade space for better distribution across buckets and less heavily loaded buckets by giving you more buckets for a given number of entries.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:Hi Monica:

which would be legally satisfied by hard-coding the same value for everything to be returned by your .hashCode().

If you did that, you could never get a "broken" structure accidentally containing duplicates, but you would get very poor performance because every item would get stuffed into the same bucket, losing the super-fast lookup times to find things.



Yes, but if "Objects with the same hash code end up in the same bucket" then why would they be in different bucket in case of same hard coded hashcode?
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:

Yes, but if "Objects with the same hash code end up in the same bucket" then why would they be in different bucket in case of same hard coded hashcode?



Yes, agreed, that if you hard-coded a constant value for hashCode(), every single entry would always end up in the exact same bucket all the time.

It would be legal and valid in terms of the .hashCode()/.equals() contract.

But it would be a terribly performing implementation that would not yield the performance characteristics normally expected from a HashMap/HashSet.

In fact, for small data or if it was only being accessed lightly once in a while, it wouldn't even matter, you would never notice the difference.

If you were storing and retrieving data to and from it very frequently, it would make your program perform poorly -- yet, still be correct.

That is why I was drawing a stark difference between what you MUST do to get correct results from your program that can be relied on, and what it is nice to do if you need or want good performance from your data structure.

The title of this thread nominally could cover both aspects, correct valid use and making it perform well.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote: if you hard-coded a constant value for hashCode(), every single entry would always end up in the exact same bucket all the time.



But as per the below it would be in different buckets:

Stephan van Hulst wrote:

If each object returned a unique (== distinct) number, they would be distributed over different buckets fairly uniformly

 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
He meant that if each distinct object returned a different hashCode(), they would (mostly) wind up in different buckets, no more or less.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks. I understood why so. That was because I had written it incorrect.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I had written the below

Monica Shiralkar wrote:
Understood that what I said was wrong as hashcode is not a unique number.
And had that been a unique number all objects of this class would have landed in the same bucket.



which is wrong.

A correct statement would have been as below:

If in a class hashcode is overridden and hard coded and also the equals method is overridden,  all objects of this class would have landed in the same bucket.




But if the count of such objects is high then can so many objects land in the same bucket?

 
Stephan van Hulst
Saloon Keeper
Posts: 12807
278
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes. In such a case, the map or set essentially turns into a linked list.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Yes. In such a case, the map or set essentially turns into a linked list.



map/set essentially turning into a linked list is further complicating my understanding of equals and hashcode.
 
Monica Shiralkar
Ranch Hand
Posts: 2418
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Yes. In such a case, the map or set essentially turns into a linked list.



Does that mean that even if one creates thousands of objects of such a class (which implements equals and hashcode with hardcoded hashcode), all will be accumulated in a single hash bucket?
 
Stephan van Hulst
Saloon Keeper
Posts: 12807
278
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, and individual buckets are usually implemented by linked lists.
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Now we get into the implementation detail that in all modern Java implementations (since version....7 or 8?) it actually switches to a tree to implement the bucket as soon as the count of items in a given bucket hits 8.

I think that tells us that they wanted to preserve fairly good access for either very large hash structures or that less-than-ideal hashCode() implementations meant that things wound up in the same bucket more than one would see ideally.
 
Campbell Ritchie
Marshal
Posts: 72409
315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:. . . (since version....7 or 8?) . . .

Not 7+; either 8+ or 9+.
That's an implementation detail which shouldn't be permitted in exams. I don't know whether it is on the syllabus.

. . . very large hash structures

No, you have a load factor and once the load factor is exceeded, the structure is enlarged with more buckets. So in theory the number of entries per bucket is reduced. The arithmetic or bit‑twiddling needed to find the bucket number is unchanged and should execute at the same speed. Remember arrays support random access.

. . . less-than-ideal hashCode() implementations . . .

Yes, that is the reason. The documentation for HashMap doesn't say anything about trees; the following is the nearest it gets to it:-

. . . when keys are Comparable, this class may use comparison order among keys to help break ties. . . .

That sentence appears in the Java8 version and not in the Java7 version, suggesting it was in Java8 that they changed to a tree. And that change may only apply when the keys implement Comparable. You would have to find the source code to be certain about that.
 
Jesse Silverman
Ranch Foreman
Posts: 325
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Campbell!

I now see that if there is a good distribution for hashCode() the automatic re-hashings on hitting capacity-loadFactor limits will take care of everything for us, so it suggests they were bailing out usages with poor distributions rather than just large structures.

The bit about tie-breaking for Comparable objects is interesting, especially since objects don't necessarily need to be Comparable to go into the structures.

That's even a little more than I thought I wanted to know about the subject, but admittedly it is a pretty important topic.

I have written interesting programs that totally completely revolved around reading and writing HashMap(s) many times, and at least on programming challenges, HashSet(s) as well.

Now if everyone would also always remember that updating an element already found in an existing HashSet or used as a key in a HashMap will break the whole thing, and by that, I mean *me*, we'd be great.

I have a friend that was thinking immutability was only important for multi-threaded programs, this is one of the biggest attractions of immutability in single-threaded programs for me.
 
Stephan van Hulst
Saloon Keeper
Posts: 12807
278
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Aye. There's a host of benefits to immutable objects. That's why some languages don't allow you to modify your variables/records/objects at all.
 
Come have lunch with me Arthur. Adventure will follow. This tiny ad:
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic