I have a little doubt regarding the efficiency of this hashCode() implementation -:
Yes, I know that it is a legitimate implementation that is bound by the contract.
However, note how equals() declares objects y and a equal because they have the same x-value. If I use y and a to map two different items, will they not be placed in the same bucket?
Does that not make this implementation slightly inefficient? I think we would prefer it if we get different hashcodes as much as possible...I just want to confirm this, because I'm revising and this keeps bothering me.
PS : This was a simple example from the book, of course it can be inefficient, and this is just my own doubt.
Normally, hashcode and equal are used to implement own equal rules. unique value field is used in hashcode calculation. hashcode is used in hashing api like hashmap.
As per equal and hashcode contract if two object are equal then their hashcode must be same and they will go in same bucket.
There is some discrepancy in the code. I think correct implementation should be like this:
Correct me if I am wrong.
Now coming to the point of hashing, according to this implementation the code will always return same hash value, if we supply similar integers inside this function call, hence all those entries will hash to the same bucket.
and based on the specification if two objects are equal according to equals method then hashCode must return true. Although this implementation will separate the objects having different value but still is inefficient as for large values supplied to the method the hashCode will return larger values and either we need large bucket or any other method (eg modulus %) to identify correct bucket. Hence a lot of values will coincide in the same bucket. And finding for equality will be inefficient if hashCode is inefficiently implemented.
If you want to override Object class' hashCode method, then it should not accept any parameter (e.g. int x) - otherwise, it would be overloading.
So, correct way to write hashCode method is:
Going further, it will break the hashCode-equals contract (because, since you've not overriding Object's hashCode method, your hashCode method will never be called by get/put method of Map etc.).
That is why, it is necessary to use correct signatures for hashCode and equals method (e.g. equals method accepts Object as an argument, but lot of time, it happens that people provide current class' object as an argument - i.e. object of HasHash).
I think, the implementation of hashCode() given above is not performance friendly. the main goal of hashing it to map, lets say m elements to n buckets. where m >>>>>>n;
Above implementation enforces n to be equals to m, also each bucket contains at most of one element. Search, insert, delete functionalists of Hash Table are equals to O(1) normally. Above implementation can take more than constant time for basic operations mentioned before.
Don Redd wrote:I think, the implementation of hashCode() given above is not performance friendly. the main goal of hashing it to map, lets say m elements to n buckets. where m >>>>>>n;
I don't think so. I would say the best hashing function would be the one which maps m elements to n buckets, where m is approximately equal to n.
That way, equals method would have a very less load.
If m >> n, then one bucket would contain too many elements, and equals method would have to do a lot of work.
if m==n, then finding the right bucket will take time, even though equals() is O(1). So,What I meant is, we have to implement hash function in a way that 1) finding right bucket and 2)comparing each element in bucket, are load balanced.
Don Redd wrote: if m==n, then finding the right bucket will take time, even though equals() is O(1). So,What I meant is, we have to implement hash function in a way that 1) finding right bucket and 2)comparing each element in bucket, are load balanced.
No, it doesn't have to search for bucket itself.
The normal flow is:
hashCode method returns the hashCode. The hashCode itself is treated as id of the bucket, and that bucket is directly accessed.
Further, if the bucket is containing more than one element, all the elements are checked for equals method.
So, finding the bucket itself is not much of a hassle. The real performance drop happens when one bucket contains too many elements.