# Purpose of List.hashCode()

posted 2 years ago

I can understand the purpose of an Object having a unique

`hashCode()`so that it goes into the correct bucket, searching operations are more efficient and the Collection classes behave in a predictable manner. But what is the purpose of List.hashCode()? Is it to search a List from a collection of Lists? Could someone provide some real life example when one might use it?~ Mansukh

posted 2 years ago

The hashCode() method is needed for objects that you put in a HashSet or objects that you use as keys in a HashMap. If, for example, you have a HashSet that contains Lists, then the hashCode() method of the Lists will be called to organize them in the internal data structure of the HashSet.

*All*objects have a hashCode() method, because hashCode() is defined in class java.lang.Object, which is the superclass of all classes. So, since a List is also an object, it ofcourse also has a hashCode() method.

The hashCode() method is needed for objects that you put in a HashSet or objects that you use as keys in a HashMap. If, for example, you have a HashSet that contains Lists, then the hashCode() method of the Lists will be called to organize them in the internal data structure of the HashSet.

posted 2 years ago

Knew this part.

This is what I wanted to know. Lists go into separate buckets of the HashSet according to the hashCode() that is calculated for each List object according to the elements' hashCode() that are contained in the List. Thanks Jesper.

Jesper de Jong wrote:Allobjects have a hashCode() method, because hashCode() is defined in class java.lang.Object, which is the superclass of all classes. So, since a List is also an object, it ofcourse also has a hashCode() method.

Knew this part.

Jesper de Jong wrote:The hashCode() method is needed for objects that you put in a HashSet or objects that you use as keys in a HashMap. If, for example, you have a HashSet that contains Lists, then the hashCode() method of the Lists will be called to organize them in the internal data structure of the HashSet.

This is what I wanted to know. Lists go into separate buckets of the HashSet according to the hashCode() that is calculated for each List object according to the elements' hashCode() that are contained in the List. Thanks Jesper.

~ Mansukh

posted 2 years ago

Another way to think of them as is as a "digest" for the List - ie, if two Lists have the same hashcode, then there's a fair chance that they contain the same values; if the hashcodes are different, you

Winston

Mansukhdeep Thind wrote:This is what I wanted to know. Lists go into separate buckets of the HashSet according to the hashCode() that is calculated for each List object according to the elements' hashCode() that is contains in the List.

Another way to think of them as is as a "digest" for the List - ie, if two Lists have the same hashcode, then there's a fair chance that they contain the same values; if the hashcodes are different, you

*know*that they can't.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

Campbell Ritchie

Sheriff

Posts: 48386

56

posted 2 years ago

How is this possible Winston? The List.hashCode() method is implemented as :

How can 2 different Lists have same hash code? Each element of each List will have a unique hash code, and hence the List itself will have a unique hash code obtained by adding 31*

Winston wrote: ... if two Lists have the same hashcode, then there's a fair chance that they contain the same values

How is this possible Winston? The List.hashCode() method is implemented as :

How can 2 different Lists have same hash code? Each element of each List will have a unique hash code, and hence the List itself will have a unique hash code obtained by adding 31*

`hashcode`to the previous summation as shown above. Am I wrong in thinking so? The above hashCode() overriding seems correct to me.

~ Mansukh

posted 2 years ago

There's nothing strange about that. It's simple to prove that there must be different lists with the same hash code: hashCode() returns a 32-bit int, which means that there are 2^32 = 4,294,967,296 possible hash codes. And there are certainly more than 4,294,967,296 possible different lists, so there must be different lists that have the same hash code.

Given the above implementation of List.hashCode(), it's easy to show an example with two different lists that have the same hash code:

Not necessarily.

Remember the hashCode / equals contract: If a.equals(b) == true, then a.hashCode() == b.hashCode() must be true. But not necessarily the other way around; if a.hashCode() == b.hashCode(), that does not mean that a.equals(b) == true.

Mansukhdeep Thind wrote:How can 2 different Lists have same hash code?

There's nothing strange about that. It's simple to prove that there must be different lists with the same hash code: hashCode() returns a 32-bit int, which means that there are 2^32 = 4,294,967,296 possible hash codes. And there are certainly more than 4,294,967,296 possible different lists, so there must be different lists that have the same hash code.

Given the above implementation of List.hashCode(), it's easy to show an example with two different lists that have the same hash code:

Mansukhdeep Thind wrote:Each element of each List will have a unique hash code

Not necessarily.

Remember the hashCode / equals contract: If a.equals(b) == true, then a.hashCode() == b.hashCode() must be true. But not necessarily the other way around; if a.hashCode() == b.hashCode(), that does not mean that a.equals(b) == true.

posted 2 years ago

Yes, but it could just as easily have been implemented as:

a perfectly legal - albeit pointless -

The basic idea of a good hashcode (and there are plenty of

Think about it: A String containing only 3 characters already encapsulates more bits of data than there are in an

HIH

Winston

Mansukhdeep Thind wrote:How is this possible Winston? The List.hashCode() method is implemented as :

blah

Yes, but it could just as easily have been implemented as:

`return 42;`

a perfectly legal - albeit pointless -

`hashCode()`implementation for ANY class. And I think Jesper's covered the rest.

The basic idea of a good hashcode (and there are plenty of

*bad*ones around, believe me) is to spread values

*evenly*across the buckets of a hashed collection. A related (but

__not__identical) property of this requirement is that different values

*tend*to produce different results; but it's absolutely impossible to guarantee.

Think about it: A String containing only 3 characters already encapsulates more bits of data than there are in an

`int`- and that doesn't even take into consideration the

`length`attribute, which would probably be a good idea to include in its hashcode as well.

HIH

Winston

Articles by Winston can be found here

posted 2 years ago

Understood.

Ohhh! OK Now it makes sense why the contract is the way it is. If it were the other way round then it would be the perfect hashing function and there wouldn't be such a fuss about this contract between

One last thing Jesper, can't we use prime numbers to formulate such a unique hashing function such that every time we add the element to a hashing collection class object, it goes into a different bucket? Is it impossible or just that it hasn't been achieved yet by mathematicians?

Jesper de Jong wrote:Mansukhdeep Thind wrote:How can 2 different Lists have same hash code?

There's nothing strange about that. It's simple to prove that there must be different lists with the same hash code: hashCode() returns a 32-bit int, which means that there are 2^32 = 4,294,967,296 possible hash codes. And there are certainly more than 4,294,967,296 possible different lists, so there must be different lists that have the same hash code.

Understood.

Jesper de Jong wrote:Mansukhdeep Thind wrote:Each element of each List will have a unique hash code

Not necessarily.

Remember the hashCode / equals contract: If a.equals(b) == true, then a.hashCode() == b.hashCode() must be true. But not necessarily the other way around; if a.hashCode() == b.hashCode(), that does not mean that a.equals(b) == true.

Ohhh! OK Now it makes sense why the contract is the way it is. If it were the other way round then it would be the perfect hashing function and there wouldn't be such a fuss about this contract between

`equals()`and

`hashCode()`.

One last thing Jesper, can't we use prime numbers to formulate such a unique hashing function such that every time we add the element to a hashing collection class object, it goes into a different bucket? Is it impossible or just that it hasn't been achieved yet by mathematicians?

~ Mansukh

posted 2 years ago

It's impossible, for all the reasons given already: There are simply too many possibilites to

Prime numbers are indeed used a lot, but simply as a method of "shuffling" bits, not to guarantee uniqueness. Indeed, you'll see 17 and 31 used quite a lot, since many optimizing compilers are aware that

and

as well as them both having good "bit shuffling" characteristics when it comes to

If you're really interested in this stuff, I suggest you check out this site, because the guy (I think his name is Bob Jenkins) has probably forgotten more about hash algorithms than you or I are ever likely to know.

Winston

- 1

Mansukhdeep Thind wrote:One last thing Jesper, can't we use prime numbers to formulate such a unique hashing function such that every time we add the element to a hashing collection class object, it goes into a different bucket? Is it impossible or just that it hasn't been achieved yet by mathematicians?

It's impossible, for all the reasons given already: There are simply too many possibilites to

*guarantee*uniqueness.

Prime numbers are indeed used a lot, but simply as a method of "shuffling" bits, not to guarantee uniqueness. Indeed, you'll see 17 and 31 used quite a lot, since many optimizing compilers are aware that

`n * 17 == (n << 4) + n`

and

`n * 31 == (n << 5) - n`

as well as them both having good "bit shuffling" characteristics when it comes to

`int`s.

If you're really interested in this stuff, I suggest you check out this site, because the guy (I think his name is Bob Jenkins) has probably forgotten more about hash algorithms than you or I are ever likely to know.

Winston

Articles by Winston can be found here

posted 2 years ago

Well, one thing I discovered for myself when reading it was that multipliers don't

A lot of hashes also use XOR (^) quite extensively; but again, don't ask me why.

Winston

Mansukhdeep Thind wrote:I will probably give that article a pass for the time being. Too heavy for my intellectual level.

Well, one thing I discovered for myself when reading it was that multipliers don't

*necessarily*have to be prime. Apparently, 33 ((

`n << 5)`) also shuffles

__+__n`byte`bits pretty well (see 'Bernstein's hash') - but why? I have no idea.

A lot of hashes also use XOR (^) quite extensively; but again, don't ask me why.

Winston

Articles by Winston can be found here

Campbell Ritchie

Sheriff

Posts: 48386

56

posted 2 years ago

You can use XOR to get the information from the left half of a 64‑bit value and its right half simultaneously. If you find

Where ^ is the XOR operator, (int) cast discards the left 32 bits, and >>> is the unsigned right shift operator. And l is a bad name for a long variable. Remember >>> has a higher precedence than ^.

I think, in this instance, it would make no difference to the final result if you used >>.

The three operators ^ >>/>>> and cast can be executed very quickly, maybe 1 clock cycle each.

I don’t know anybody round here who is researching algorithms and can explain whether * 31 or * 33 or *17 gives a better distribution of 1s and 0s.

*Effective Java™*(Joshua Bloch), you find things like this;-Where ^ is the XOR operator, (int) cast discards the left 32 bits, and >>> is the unsigned right shift operator. And l is a bad name for a long variable. Remember >>> has a higher precedence than ^.

I think, in this instance, it would make no difference to the final result if you used >>.

The three operators ^ >>/>>> and cast can be executed very quickly, maybe 1 clock cycle each.

I don’t know anybody round here who is researching algorithms and can explain whether * 31 or * 33 or *17 gives a better distribution of 1s and 0s.

posted 2 years ago

That's true; but the same could be said of AND or OR. I seem to remember that is has something to do with the operation not being "reversible". As I recall, a well designed XOR encryption can be very difficult to decrypt, since the function imparts no knowledge of the components of the result. It also tends to 'flip' bits on a roughly 50/50 basis.

But beyond that, I'm well out of my depth.

Winston

Campbell Ritchie wrote:You can use XOR to get the information from the left half of a 64‑bit value and its right half simultaneously.

That's true; but the same could be said of AND or OR. I seem to remember that is has something to do with the operation not being "reversible". As I recall, a well designed XOR encryption can be very difficult to decrypt, since the function imparts no knowledge of the components of the result. It also tends to 'flip' bits on a roughly 50/50 basis.

But beyond that, I'm well out of my depth.

Winston

Articles by Winston can be found here