This week's book giveaway is in the Big Data forum. We're giving away four copies of Elasticsearch in Action and have Radu Gheorghe & Matthew Lee Hinman on-line! See this thread for details.

I can understand the purpose of an Object having a unique hashCode() so that it goes into the correct bucket, searching operations are more efficient and the Collection classes behave in a predictable manner. But what is the purpose of List.hashCode()? Is it to search a List from a collection of Lists? Could someone provide some real life example when one might use it?

All objects have a hashCode() method, because hashCode() is defined in class java.lang.Object, which is the superclass of all classes. So, since a List is also an object, it ofcourse also has a hashCode() method.

The hashCode() method is needed for objects that you put in a HashSet or objects that you use as keys in a HashMap. If, for example, you have a HashSet that contains Lists, then the hashCode() method of the Lists will be called to organize them in the internal data structure of the HashSet.

Jesper de Jong wrote:All objects have a hashCode() method, because hashCode() is defined in class java.lang.Object, which is the superclass of all classes. So, since a List is also an object, it ofcourse also has a hashCode() method.

Knew this part.

Jesper de Jong wrote:The hashCode() method is needed for objects that you put in a HashSet or objects that you use as keys in a HashMap. If, for example, you have a HashSet that contains Lists, then the hashCode() method of the Lists will be called to organize them in the internal data structure of the HashSet.

This is what I wanted to know. Lists go into separate buckets of the HashSet according to the hashCode() that is calculated for each List object according to the elements' hashCode() that are contained in the List. Thanks Jesper.

Mansukhdeep Thind wrote:This is what I wanted to know. Lists go into separate buckets of the HashSet according to the hashCode() that is calculated for each List object according to the elements' hashCode() that is contains in the List.

Another way to think of them as is as a "digest" for the List - ie, if two Lists have the same hashcode, then there's a fair chance that they contain the same values; if the hashcodes are different, you know that they can't.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here

Winston wrote: ... if two Lists have the same hashcode, then there's a fair chance that they contain the same values

How is this possible Winston? The List.hashCode() method is implemented as :

How can 2 different Lists have same hash code? Each element of each List will have a unique hash code, and hence the List itself will have a unique hash code obtained by adding 31*hashcode to the previous summation as shown above. Am I wrong in thinking so? The above hashCode() overriding seems correct to me.

Mansukhdeep Thind wrote:How can 2 different Lists have same hash code?

There's nothing strange about that. It's simple to prove that there must be different lists with the same hash code: hashCode() returns a 32-bit int, which means that there are 2^32 = 4,294,967,296 possible hash codes. And there are certainly more than 4,294,967,296 possible different lists, so there must be different lists that have the same hash code.

Given the above implementation of List.hashCode(), it's easy to show an example with two different lists that have the same hash code:

Mansukhdeep Thind wrote:Each element of each List will have a unique hash code

Not necessarily.

Remember the hashCode / equals contract: If a.equals(b) == true, then a.hashCode() == b.hashCode() must be true. But not necessarily the other way around; if a.hashCode() == b.hashCode(), that does not mean that a.equals(b) == true.

Mansukhdeep Thind wrote:How is this possible Winston? The List.hashCode() method is implemented as :
blah

Yes, but it could just as easily have been implemented as:
return 42; a perfectly legal - albeit pointless - hashCode() implementation for ANY class. And I think Jesper's covered the rest.

The basic idea of a good hashcode (and there are plenty of bad ones around, believe me) is to spread values evenly across the buckets of a hashed collection. A related (but not identical) property of this requirement is that different values tend to produce different results; but it's absolutely impossible to guarantee.

Think about it: A String containing only 3 characters already encapsulates more bits of data than there are in an int - and that doesn't even take into consideration the length attribute, which would probably be a good idea to include in its hashcode as well.

Mansukhdeep Thind wrote:How can 2 different Lists have same hash code?

There's nothing strange about that. It's simple to prove that there must be different lists with the same hash code: hashCode() returns a 32-bit int, which means that there are 2^32 = 4,294,967,296 possible hash codes. And there are certainly more than 4,294,967,296 possible different lists, so there must be different lists that have the same hash code.

Understood.

Jesper de Jong wrote:

Mansukhdeep Thind wrote:Each element of each List will have a unique hash code

Not necessarily.

Remember the hashCode / equals contract: If a.equals(b) == true, then a.hashCode() == b.hashCode() must be true. But not necessarily the other way around; if a.hashCode() == b.hashCode(), that does not mean that a.equals(b) == true.

Ohhh! OK Now it makes sense why the contract is the way it is. If it were the other way round then it would be the perfect hashing function and there wouldn't be such a fuss about this contract between equals() and hashCode().

One last thing Jesper, can't we use prime numbers to formulate such a unique hashing function such that every time we add the element to a hashing collection class object, it goes into a different bucket? Is it impossible or just that it hasn't been achieved yet by mathematicians?

Mansukhdeep Thind wrote:One last thing Jesper, can't we use prime numbers to formulate such a unique hashing function such that every time we add the element to a hashing collection class object, it goes into a different bucket? Is it impossible or just that it hasn't been achieved yet by mathematicians?

It's impossible, for all the reasons given already: There are simply too many possibilites to guarantee uniqueness.

Prime numbers are indeed used a lot, but simply as a method of "shuffling" bits, not to guarantee uniqueness. Indeed, you'll see 17 and 31 used quite a lot, since many optimizing compilers are aware that
n * 17 == (n << 4) + n and
n * 31 == (n << 5) - n as well as them both having good "bit shuffling" characteristics when it comes to ints.

If you're really interested in this stuff, I suggest you check out this site, because the guy (I think his name is Bob Jenkins) has probably forgotten more about hash algorithms than you or I are ever likely to know.

Mansukhdeep Thind wrote:I will probably give that article a pass for the time being. Too heavy for my intellectual level.

Well, one thing I discovered for myself when reading it was that multipliers don't necessarily have to be prime. Apparently, 33 ((n << 5) + n) also shuffles byte bits pretty well (see 'Bernstein's hash') - but why? I have no idea.

A lot of hashes also use XOR (^) quite extensively; but again, don't ask me why.

Winston

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 41098

29

posted

0

You can use XOR to get the information from the left half of a 64‑bit value and its right half simultaneously. If you find Effective Java™ (Joshua Bloch), you find things like this;-

Where ^ is the XOR operator, (int) cast discards the left 32 bits, and >>> is the unsigned right shift operator. And l is a bad name for a long variable. Remember >>> has a higher precedence than ^.
I think, in this instance, it would make no difference to the final result if you used >>.
The three operators ^ >>/>>> and cast can be executed very quickly, maybe 1 clock cycle each.
I don’t know anybody round here who is researching algorithms and can explain whether * 31 or * 33 or *17 gives a better distribution of 1s and 0s.

Campbell Ritchie wrote:You can use XOR to get the information from the left half of a 64‑bit value and its right half simultaneously.

That's true; but the same could be said of AND or OR. I seem to remember that is has something to do with the operation not being "reversible". As I recall, a well designed XOR encryption can be very difficult to decrypt, since the function imparts no knowledge of the components of the result. It also tends to 'flip' bits on a roughly 50/50 basis.

But beyond that, I'm well out of my depth.

Winston

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 41098

29

posted

0

Why you use XOR instead of AND or (inclusive) OR? You have said it yourself. It tends to flip bits on a roughly 50/50 basis. Exactly what you want for a hash code.