aspose file tools*
The moose likes Java in General and the fly likes Duplicates in HashSet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Duplicates in HashSet" Watch "Duplicates in HashSet" New topic
Author

Duplicates in HashSet

Singhal Anuj
Greenhorn

Joined: Mar 28, 2005
Posts: 8
Hi,
I am using jdk1.4.2.
I am using HashSet, and seem, it allows me to add duplicate values.

HashSet implements Set interface,
and in Set interface javadoc, it is clearly written that
"A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2)"
Similiar thing is written for add() javadoc.

My code snippet is as follows:-
--------------------------------------------------------------
import java.util.HashSet;

public class TestHashSet
{
public static void main(String arg[])
{
HashSet set = new HashSet();
set.add(new Inner(1));
set.add(new Inner(1));
set.add(new Inner(1));
System.out.println(set.size());
}
}
class Inner
{
private int i=0;
public Inner(int _i)
{
i=_i;
}
public boolean equals(Object o)
{
Inner inner = (Inner)o;
if(inner.i==this.i)
{
return true;
}
else
{
return false;
}
}

/* public int hashCode()
{
return i*1000;
} */
}
-------------------------------------------------------------

This code returns me size as 3.
If i uncomment hasCode() method,
it correctly prints me size as 1.
But, i believe, it should be using only equals() method to check duplication, and not hashCode()

After looking into implementation of HashSet class, found that it uses HashMap to store values.

I am wondering, where i am going wrong?
Please suggest.
thanks,
Anuj
Roger Chung-Wee
Ranch Hand

Joined: Sep 29, 2002
Posts: 1683
You are adding three new objects to the HashSet, so the size is three. The fact that the objects are of the same type is what is confusing you.

After looking into implementation of HashSet class, found that it uses HashMap to store values.

No, the HashSet's objects are internally stored as keys in a Hashmap. If memory serves me right, all the Hashmap's values are the same (an instance of an Object).


SCJP 1.4, SCWCD 1.3, SCBCD 1.3
Harish Kashyap
Ranch Hand

Joined: Jun 14, 2000
Posts: 118
When you add an element to HashSet/HashMap,
1) first the hashcode of object are matched against that of all other elements
2) then the object are matched with either == or .equals

And its added only when the hashcode is not same "and" == or.equals return false

So when you uncomment the hashCode method, only 1 element is added to the HashSet.

Moreover, as per definition of 'hashCode' method

If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of
the two objects must produce the same integer result.

So, for correct operations, its your responsibility to override the 'hashCode' method if you are overriding the 'equals' method.
Srivatsan santhanam
Greenhorn

Joined: Jan 04, 2006
Posts: 23
yes.. provide your own unique implementation of
hashCode and equals(..) for Inner class.


Java Objects passed by Reference ?? -> you are a failure !!
Singhal Anuj
Greenhorn

Joined: Mar 28, 2005
Posts: 8
Hi ,
thanks for the input.
yes, you are right, here, reason that i have not overridden hashCode() method appropriately.
But, i would like to address another aspect of it:-
From API of hashCode() in Object class,
-------------------------------------------------------------
3) It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
--------------------------------------------------------------
Two unequal objects, say o1 and o2, are allowed to return same hashCode,
but going by the Theory of "added in HashSet only when the hashcode is not same "and" == or.equals return false"
suggests that o1 and o2 will not be added in HashSet inspite being unEqual()

I am trying to convey that hashCode should not be the criteria to detect Duplication.
Let me know, if i am wrong.
thanks,
Anuj
Harish Kashyap
Ranch Hand

Joined: Jun 14, 2000
Posts: 118
That is actually true....
If 2 different objects return same hashcode, only first object will be added to the HashMap.

If you see the src of HashMap

Harish Kashyap
Ranch Hand

Joined: Jun 14, 2000
Posts: 118
I would like to correct myself here...

An object is added in HashMap if

1. hashCode doesn't match
2. hashCode matches, but .equals() or || operator return false
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24187
    
  34

Originally posted by Harish Kashyap:
That is actually true....
If 2 different objects return same hashcode, only first object will be added to the HashMap.


No, this is most certainly not true, and I don't see how you think the source code shows this.

What the Javadoc says is that two equal objects must have the same hashCode(), but two unequal objects should have different hashCodes(). There's generally no way to ensure that two objects are guaranteed to have different hashCode values, and that's OK -- nothing will break, although having different values for unequal objects gives the best performance. BUt if a class had a hashCode() method that returned "1" for every instance of a class, HashSet would not malfunction as long as equals() properly compared instances of the class.

Imagine that you have a large stack of index cards with people's names on them. You have to sort the cards into alphabetical order. How do you do it? Here's one way: first, make 26 piles, one for each letter of the alphabet; then sort each pile. By first making smaller piles, you reduce the amount of work you have to do comparing and ordering cards.

This is exactly how HashMap/HashSet/Hashtable work. The hashCode() of the keys is used to sort the objects into piles. Then the objects in each pile are compared for equality using equals(). If two objects have the same hashCode(), then they end up in the same pile, and if two of them are equal(), then one will be discarded. If, on the other hand, two objects have different hashCodes, they will be in different piles and they will never be directly compared.

So you must override hashCode() so the objects end up in the right piles, and equals() so they are compared correctly. OK?


[Jess in Action][AskingGoodQuestions]
 
Don't get me started about those stupid light bulbs.
 
subject: Duplicates in HashSet