GeeCON Prague 2014*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes How hashCode() works? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "How hashCode() works?" Watch "How hashCode() works?" New topic
Author

How hashCode() works?

Arhaan Singhania
Ranch Hand

Joined: Mar 28, 2011
Posts: 89
Hi,
I know how exactly equals() and hashCode() works, what is their contract but i am eager to know how the hashing works
behind the scene. When does the JVM uses the hashCode to land the objects into the correct bucket. Any information would
be helpful.
Arhaan
Helen Ma
Ranch Hand

Joined: Nov 01, 2011
Posts: 451
For the purpose of the exam, the hash code of an object is the index of the 2D entry table behind a hashset or hashmap. For Integer, String, Double... , the hash code of them are unique. For example all Integer with int value = 1 has the same hashcode.

Objects are put into an entry table when HashSet/Map is implemented.
So, when you do aSet.add(new Integer(1)); here are the following steps implemented by HashSet/Map:
1. Search the entry table of index = the hash code of new Integer(1). For the sake of discussion, let's assume the hash code of it is 1. Look at entryTable[1] to see if there is an integer object.
2. If there is no integer object, add it to entryTable[1].
2.1 If there is some integer object(s) in entryTable[1], the equals method is used to compare if they are equal.
2.2. If they are equals, don't add the new Integer(1).
2.3. If they are not equals, add the new Integer(1).
This algorithm apply to all object type (such as Animal type that is used in KB's book).

When you do aSet.contain(new Integer(1)) ; here are the steps implemented:
1. The hash code of this object is calcuated during runtime. And it is 1.
2. search entrytable[1] to see if there is a new Integer(1).
2.1 If yes, return true; else false

Using hash code to search for objects in hash set or map is efficient because the hash code h is computed. And to search for the object , just look at entryTable[h] to look for it.

There are a lot of different algorithms to compute a unique hash code for objects. These algorithm out of the scope of the exam because there are a lot of researches about how to build such algorithms. This can be a phD thesis topic.
In the exam, usually for simplicity, hash codes are set to be the same for all instance of the same type.

The contract of hashcode and equals:
1. If two objects are equal based on your overriding equals method, their hashcode must be the same. In entryTable[h], no two equal objects are put in this h slot.
2. If two objects are not equals, their hash code can be the same or can be different. In entryTable[h], you can put a list of unequal objects with the same hashcode.
3. If two hash codes of two objects are the same, the two objects can be different or the same.
4. If the two hash codes of the two objects are different, the two objects must be different.

Arhaan Singhania
Ranch Hand

Joined: Mar 28, 2011
Posts: 89
Dear Helen,

What is the concept of 2-D entry table. Can you explain a little. Rest of the explanation is pretty clear and it makes sense to me.
Thanks a ton.

Thanks,
Arhaan
Helen Ma
Ranch Hand

Joined: Nov 01, 2011
Posts: 451
If you take a look at HashMap.java, the open source code, there is an instance field Entry[] table. Entry is a static inner class of HashMap.

Entry[] table is a 1D array actually. I need to correct this, instead of a 2D array, it is a 1D array. Each table[i] consists of an object or a list of objects, where i is the hash code to of the object.

For the example, we may not be asked about the Entry[] table that the HashMap has.

Let me use a simpler example to visualize the implementation of a hash set or map.

Suppose we have 3 buckets with a label on each of them. The labels are "R", "G", "B". The label represents hash code.
Suppose we have 9 balls of red, pink, pink, green, dark green, green, blue, blue and dark blue colors. Our game is to put the right ball into the right bucket with the matching color. But no bucket has more than 1 ball with the same color.
Suppose we will put pink and red balls into "R" bucket, green and dark green into "G" bucket, blue and dark blue into "B" bucket.

To implement the steps to put balls in the bucket:
1. First we pick a pink ball, which is represented by "R" label(hash code). We put it in "R" bucket.
2. Then, we pick a red ball, which is also represented by "R" label. We first check "R" bucket and there is no red ball. Then, we can add the red ball to the R bucket.
3. We pick another red ball. We check "R" bucket has already had a red ball. So, we don't add it.
Repeat the above steps for green and blue balls.

In this game, we can see pink ball and red ball are represented by the same "R" label (or hash code). But they are not equal because they have different colors. The contract is : if two objects have the same hash code, they may be equal or not equal. That is why the second red ball is not added to the R bucket because it is considered as being equal to the first red ball.
Also, we can see a red ball and a green ball are represented by different labels ,"R" and"G" (or hash code). They are not equal. The contract is: if two objects have different hash codes, they must not be equal.

ayush raj
Ranch Hand

Joined: Jan 15, 2012
Posts: 60
I have read K&B book on the hashCode and equals method . However , I could not get what actually the use of hash code is ? The bucket example was clear from the book but I would like to know how the String class implements it . What is meant by : " two objects are equal , if their hashcodes are same " ?
Helen Ma
Ranch Hand

Joined: Nov 01, 2011
Posts: 451
Hi, you can try to create two string objects with the same value and print out their hash code. For example,


Try to see if the hash codes are equal. I am sure they are equal.
For String objects, if they are meaningfully equal, their hash codes are equal.

Try to see if it works in Integer, Double, Float, Short.....
ayush raj
Ranch Hand

Joined: Jan 15, 2012
Posts: 60
I tried for Integer , Float .. but they are showing me output as the value and not the hashCode value . I wrote :



Why is it so ? Why do we require hashing in such cases ?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

ayush raj wrote:I tried for Integer , Float .. but they are showing me output as the value and not the hashCode value . I wrote :



Why is it so ? Why do we require hashing in such cases ?


In the case of the Integer class, the hashcode *is* the same as the value... IMO, I disagree with this. Most applications don't use the full range of an integer -- there should be some scrambling of the bits to promote a more even distribution. Oh well.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
ayush raj
Ranch Hand

Joined: Jan 15, 2012
Posts: 60
Henry Wong:
I disagree with this


I don't know what you disagree with , but this output:- 10 10 , is being displayed by my JVM .
Helen Ma
Ranch Hand

Joined: Nov 01, 2011
Posts: 451
ayush raj wrote:I tried for Integer , Float .. but they are showing me output as the value and not the hashCode value . I wrote :



Why is it so ? Why do we require hashing in such cases ?


Each object has a hash code even though hash set is not used.

Look at the open source String.java developed by Lee Boynton and Aruthur van Hoff:

/**
* Returns a hash code for this string. The hash code for a
* <code>String</code> object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using <code>int</code> arithmetic, where <code>s[i]</code> is the
* <i>i</i>th character of the string, <code>n</code> is the length of
* the string, and <code>^</code> indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}




This is the algorithm to compute a unique hash code for a String object because on its value. According to KB, it will be a phD thesis research topic to develop algorithms to compute unique hash code for each object. But it won't be on the exam.

There is a hash code because we may want to put the object in Hashtable/HashSet/HashMap. Using a hash code as an index to search for an object is efficient. For a simplier example, every book has a index appendix. You want to search for a keyword
"HashSet". You will look at the "H" column to find "HashSet" instead of search a keyword from A to Z.



Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

ayush raj wrote:
Henry Wong:
I disagree with this


I don't know what you disagree with , but this output:- 10 10 , is being displayed by my JVM .



I am disagreeing with the implementation -- not with the output (I know what the output is ... ).

Hashing is an algorithm -- an algorithm that yields an average O(1). The purpose of the equals() / hashCode() contact is to ensure that the hashing collection works.... but just working isn't enough IMO. In order to have the average O(1), the hashing algorithm needs to be fast, also O(1), and yields an even distribution. Having the Integer objects class hashcode be the same as its value means that the hashing algorithm is at the mercy of how the application is using the Integer objects.

Henry
guru ghulab
Greenhorn

Joined: Dec 16, 2013
Posts: 1
also check out these links,

equal and hashcode
how hashcode works
java hashing

 
GeeCON Prague 2014
 
subject: How hashCode() works?