aspose file tools*
The moose likes Beginning Java and the fly likes hash collision avoid ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "hash collision avoid ? " Watch "hash collision avoid ? " New topic
Author

hash collision avoid ?

kri shan
Ranch Hand

Joined: Apr 08, 2004
Posts: 1368
Is it possible to avoid hash collision using hashCode() and equals() method ?
Christian Dillinger
Ranch Hand

Joined: Jul 20, 2009
Posts: 188
No, it's not. Take a look at java.awt.Point. Only two values x and y.

Point a, Point b.
a.equals(b) => true if a.x=b.x and a.y=b.y.

hashCode only has Integer.MAX_VALUE * 2 different values, but there are more different (regarding equals()) instances of java.awt.Point.
kri shan
Ranch Hand

Joined: Apr 08, 2004
Posts: 1368
Can i avoid collision using default load factor 0.75 with setting right initial capacity(number of slots) ?
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3370
    
    9
Load factor and capacity have nothing to do with collisions. Collisions are purely affected by the quality of the hashCode() method.
Christian Dillinger
Ranch Hand

Joined: Jul 20, 2009
Posts: 188
kri shan wrote:Can i avoid collision using default load factor 0.75 with setting right initial capacity(number of slots) ?


No, you cannot. A collision is a collision is a collision. If there are more possible instances than possible hashcode values there MUST be collisions (if there are 367 people in one room there must be at least two having the same birthday). HashMap can handle them because it first checks hashCode and if it has a collision it checks equals.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
I think what Christian means to say is that in general it's not possible to completely avoid collision. For any data type with more that four bytes of data, collisions will be possible. However it's still possible to avoid them, meaning minimize their chance of occurrance, using a well-chosen hash function.

Stephan van Hulst wrote:Load factor and capacity have nothing to do with collisions. Collisions are purely affected by the quality of the hashCode() method.

I disagree. There are two types of collisions:

1. Two objects have the same hash code, and the HashMap needs to check the equals() method to see if they are the same.

2. Two objects have different hash code, but end up in the same bucket anyway (because the current capacity is less than Integer.MAX_VALUE). In this case, the HashMap just needs to check the actual hashCode() of each object (which was saved when the object was put in the Map) to see if they are the same.

The first type of collision is determined only by hash code, sure. But the second certainly is affected by the capacity, which is affected by the load factor.

In general, setting a lower load factor will decrease collsions (of the second type, which is much more common) at the cost of using more memory. It's a trade-off.
Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

excellent explanation Mike
kri shan
Ranch Hand

Joined: Apr 08, 2004
Posts: 1368
if two objects have different hash code, but end up in the same bucket anyway (because the current capacity is less than Integer.MAX_VALUE). I

I do not think so.
if two objects have different hash code, then it should placed in the different hash bucket.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18118
    
    8

kri shan wrote:if two objects have different hash code, but end up in the same bucket anyway (because the current capacity is less than Integer.MAX_VALUE). I

I do not think so.
if two objects have different hash code, then it should placed in the different hash bucket.


Well, no. There are 2^32 different hash codes possible. If you require different hash codes to go into different buckets then you would have to have 2^32 buckets. Obviously that doesn't make any sense at all.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36478
    
  16
If you read the code, you find the original hash code is rehashed, and then AND-ed with capacity - 1. That means in a capacity 16 Map, you execute rehash & 0x000f; in an array-based implementation, you use that result to get the index in the array where the bucket is. That means any rehashes whose least significant 4 bits are the same will suffer a hash collision. When you get to 12 members (if the load factor is 0.75), the capcacity is changed to 32, and you repeat the insertion with rehash & 0x001f instead. Now a hash collision occurs if the 5 least significant bits are the same.

You mustn't worry about hash collisions. They are not some sort of collision on the roads; they don't do any harm.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Well, they can do "harm" in the sense that certain use cases can have very poor performance. (This was particularly true before they introduced the rehash.) Having a few hash collisions is no big deal, but having a lot of them can be bad.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: hash collision avoid ?
 
Similar Threads
Arrays: Calling input from other methods and classes
B&S : how to implement equal and hash
hash collisions in java
Doubt in Hashcode()
why we can't avoid hash collision in practice ?