aspose file tools*
The moose likes Java in General and the fly likes Problem with equals on collection Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Problem with equals on collection" Watch "Problem with equals on collection" New topic
Author

Problem with equals on collection

John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
I have a run time error that causes incorrect results, but I am having trouble understanding what is causing the error.
At the point where there error occurs, two lists are being compared. The problem is the comparison is using the wrong equal function.

The code data structure looks something like this



The data assignment looks like



Here's what the problem looks like



The problem is in the line

boolean sameOutputs = treeOutputs.containsAll(combOutputs); <-- This line doesn't work

That line uses the equal function from class ExtendedData, instead of the function from BaseData. And, I can't tell why. The line before it uses the proper function. The first question is how to determine why the ExtendedData equal function is being used, and also how to modify the code to use the BaseData equal function.

Which brings up a second question. In Eclipse, how can I access ArrayList<E>.containsAll(Collection<?>) function. I have jdk1.7.0 src.zip in the build path. It may be helpful if I could step into containsAll to track what it going on. What should I change in Eclipse so that I can step into containsAll.
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

Hi John. Can you please PostRealCode? I have a hard time thinking the program you posted would compile at all.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
To get "past" the problem, I modified the equal function of ExtendedData to check if the size of the history is zero, after check if super is equal. If the size of either argument is zero, then it returns true. But, I still can't understand why ExtendedData.equals is being called on a comparison between List<BaseData > and ArrayList<BaseData> objects.

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

If you post actual code, maybe I can help answer your question.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
Stephan van Hulst wrote:Hi John. Can you please PostRealCode? I have a hard time thinking the program you posted would compile at all.


I only posted pseudo scripts, to show the data structure, and the loop where the problem occurs. The problem occurs because of an issue with collections.equals, and I don't know why it uses the inherent class instead of the base class. I can't post the original code, I'll see if I can make a similar example.
dennis deems
Ranch Hand

Joined: Mar 12, 2011
Posts: 808
What is the type of object that the lists contain?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18657
    
    8

John Vorwald wrote:That line uses the equal function from class ExtendedData, instead of the function from BaseData. And, I can't tell why. The line before it uses the proper function. The first question is how to determine why the ExtendedData equal function is being used, and also how to modify the code to use the BaseData equal function.


If you have an ExtendedData object and you call its equals() method, and the ExtendedData class specifically overrides equals(), then that overridden method is what is going to be called. The fact that some BaseData variable somewhere refers to that object is irrelevant. Look up "polymorphism" if you didn't already know that.

As for the second part of your question, if you don't want ExtendedData objects to be compared for equality via the equals() method you specifically wrote for the ExtendedData class, then you shouldn't have written that method.

As you are finding, writing different equals() methods for superclass and subclass can be problematic: for example if you write

then the two calls to "equals()" call different methods which might return different values. And it's extremely hard to write the two methods, one of which overrides the other, so that they always return the same value in this situation. And presumably you haven't done that.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
The example below simulates the problem that the actual program is experiencing.

The questions are:

1) In the main function, is it possible to access the BaseFoo equals function?

2) In more general terms, I have collections of objects that contain of collections of data. Sometimes, I want to compare both the data in the object and the identifier, using BaseFooWithData equals function. Othertimes, I only need to compare the identifiers, and would like to use BaseFoo equals function.

So, the more general question is: when a collection is based on objects that are derived, how to access a lower equals function?

3) Or, now after writing all this, it seems that the fundamental question is "When an object is derived from more than one super class, how to access overridden functions in the lower super classes?"




Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

John Vorwald wrote:The example below simulates the problem that the actual program is experiencing.

The questions are:

1) In the main function, is it possible to access the BaseFoo equals function?


No. Not from outside the class anyway. BaseFoo's subclass can call super.whatever(), but you can't do that from outside.

If you sometimes want to compare using some criteria, and sometimes using others, then you should either write two different classes, or use a data structure that accepts a Comparator, such as a Set.

2) In more general terms, I have collections of objects that contain of collections of data. Sometimes, I want to compare both the data in the object and the identifier, using BaseFooWithData equals function. Othertimes, I only need to compare the identifiers, and would like to use BaseFoo equals function.


You cannot chose whether polymorphism gets used or not. It always will if it is possible. For cases like list.contains(), you have the choices above, since contains() is always going to call equals(), which will always be the "deepest" overriding version.

On the other hand, you could create your own method in the child class, say, "parentEquals()", that just calls super.equals(), and then write your own search function that calls that. Depending on how common this situation is for you, you might want to define an interface with the parentEquals() method, and have the relevant classes implement it. Then your search method signature might look like public boolean containsByParentEquals(Collection<ParentEqualsInterface> c, ParentEqualsInterface element)

Though I'm not sure that the whole approach of sometimes calling parent's equals() is even a reasonable design.

So, the more general question is: when a collection is based on objects that are derived, how to access a lower equals function?


In general, you don't. In over a decade of programming in Java professionally I have never once needed to. If you genuinely need to--and this isn't just misguided approach to whatever problem you're trying to solve, which seems the more likely case to me--then you're options are pretty much as above.

3) Or, now after writing all this, it seems that the fundamental question is "When an object is derived from more than one super class, how to access overridden functions in the lower super classes?"


Again, you should never need to.

Also, I think most people picture superclasses as "higher" in the tree. It's an upsidedown tree, with the base at the top and branches spreading out going downward. At least, I assume that's how most people picture it, because that's how I do.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

John Vorwald wrote:2) In more general terms, I have collections of objects that contain of collections of data. Sometimes, I want to compare both the data in the object and the identifier, using BaseFooWithData equals function. Othertimes, I only need to compare the identifiers, and would like to use BaseFoo equals function.

Which is precisely the problem that everybody has been trying to explain to you; and it has nothing to do with collections. It has to do with the equals() method itself and how it works polymorphically in a class hierarchy.

First off, there are 5 rules about equals() that you need to know about (detailed, I think, in java.lang.Object):
1. It must be reflexive: ie, x.equals(x) must return true.
2. It must be symmetric: ie, x.equals(y) must return the same value as y.equals(x).
3. It must be transitive: ie, if x.equals(y) is true and y.equals(z) is true, then x.equals(z) must also be true.
4. It must be consistent: ie, multiple invocations of x.equals(y) must return the same value.
5. x.equals(null) must return false.
(All of the above assume that x, y and z are all non-null)

And the one you are almost certainly violating is Rule number 2 - symmetry; and the fact of the matter is, it is a basic problem of equals() when used in hierarchies.
As Josh Bloch puts it: "there is no way to extend an instantiable class and add a value component while preserving the equals() contract" (EJv2 p.38).

Some people think you can get around the problem by simply using getClass() instead of instanceof and have all inter-class comparisons return false; but in fact all that does is violate another rule: the Liskov Substitution Principle, which manifests itself a bit more subtlely.

One option that JB suggests is to use composition instead of inheritance, and add a 'view' method to the composite, but I'm not sure if even that will help in your case. You might want to check it out though.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Winston Gutkowski wrote:
Some people think you can get around the problem by simply using getClass() instead of instanceof and have all inter-class comparisons return false; but in fact all that does is violate another rule: the Liskov Substitution Principle, which manifests itself a bit more subtlely.


Though it's rare, there can be cases where an object should only be able to be equal to another object of the same class, and not to an instance of a super or subclass. (And yes, sometimes it's okay to violate LSP, and LSP itself can be interpreted with varying degrees of strictness--to the point where any change to implementation can be argued to violate it.) In this case, use getClass(), not instanceof.

In terms of strictly honoring the equals() contract, the rule is as follows: If any type in a hierarchy uses instanceof, then every one of its subtypes must also use instanceof, and must compare against the same type.
John Vorwald
Ranch Hand

Joined: Sep 26, 2010
Posts: 139
Thank you for the comments.

It seems that "equals" is not what I had in mind, I have in mind a function in the base class that compares the identifier, to determine if the data name/source are the same, independent of the data. It makes sense in the base class to have that functionality in the equals function. But, it doesn't make sense in the derived classes for the reasons given.

Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

John Vorwald wrote:Thank you for the comments.

It seems that "equals" is not what I had in mind, I have in mind a function in the base class that compares the identifier, to determine if the data name/source are the same, independent of the data. It makes sense in the base class to have that functionality in the equals function. But, it doesn't make sense in the derived classes for the reasons given.



Depending on when you'll be wanting one set of semantics vs. the other, it may still make sense to define equals to implement one of those comparisons (the one that you'll use for Collection.contains(), etc.), and just define a custom method for the cases where you need to do the other comparison.
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

I never got this. Why does using getClass() violate LSP? If a 'striped feline' can decide it's not the same as a 'black feline', then why can't it decide it's not the same as a 'striped tiger'? Is it because a 'striped tiger' is-a 'striped feline'? But that begs the question, are all 'striped felines' the same?

Even if it *does* violate this principle for some reason I can't fathom, what does it matter? Are there situations that could go horribly wrong when we don't adhere to it? Real life examples of applications or libraries that failed because someone got this wrong?
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Stephan van Hulst wrote:I never got this. Why does using getClass() violate LSP?


Because by some interpretations of LSP, Parent.foo() should produce the exact same results as Child.foo(), so if Parent1.equals(Parent2) gives true, then Parent2.equals(Child1) and Child1.equals(Parent2) should also give true (assuming the relevant fields are equal).

Even if it *does* violate this principle for some reason I can't fathom, what does it matter? Are there situations that could go horribly wrong when we don't adhere to it? Real life examples of applications or libraries that failed because someone got this wrong?


Well, no. Like so many other principles we encounter, it's a guideline. If you find you're violating it, you should take a look at why and see if it makes sense and is worth it in that particular situation. I expect that the negative results of violating LSP are much less in code failing and more in increased maintenance costs (and perhaps code failing as a secondary effect, due to more difficult maintenance).
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

But then doesn't *all* method overriding violate LSP? Personally I think the LSP is too vague to be useful.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Stephan van Hulst wrote:But then doesn't *all* method overriding violate LSP?


Yes, as I stated earlier, if you take it strictly enough.

Personally I think the LSP is too vague to be useful.


I suppose if you can find a level where it reminds you what the general blueprint for inheritance is, it can serve as a decent guildeline.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Stephan van Hulst wrote:I never got this. Why does using getClass() violate LSP? If a 'striped feline' can decide it's not the same as a 'black feline', then why can't it decide it's not the same as a 'striped tiger'? Is it because a 'striped tiger' is-a 'striped feline'? But that begs the question, are all 'striped felines' the same?
Even if it *does* violate this principle for some reason I can't fathom, what does it matter? Are there situations that could go horribly wrong when we don't adhere to it? Real life examples of applications or libraries that failed because someone got this wrong?

Unfortunately, things like animal hierarchies don't really illustrate the point because very often the "subtype" actually indicates that two 'Felines' aren't equal anyway.

Take, however, the java.awt.Point class which, for reasons I've never quite been able to fathom, is mutable.

Suppose I wanted to extend it to create an ImmutablePoint class that, as far as I'm concerned, is simply a Point that can't be changed. I could then set up a HashSet<Point> and hopefully have it contain any mixture of Points and ImmutablePoints I want. The trouble is that if the person who wrote the Point class's equals() method used getClass() rather than instanceof, the HashSet would never recognize one of my ImmutablePoints as equal to an existing Point.

Worse still, if I override its equals() method to use instanceof, so that I can compare my ImmutablePoints with regular ones, I have now, unknowingly, violated the symmetry rule for equals(), because for any ImmutablePoint x and Point y, x.equals(y) might be true while y.equals(x) will always be false. And that could cause some very interesting behaviour in my HashSet<Point>.

So now I have to know whether the class I'm extending used getClass() or instanceof to implement its equals() method, which seems an awful lot like "unhiding" to me.

Winston
Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3647
    
  17

I see what you're saying, but all the examples I have seen regarding the LSP involved subclassing broken classes.

A point is a point is a point. A point can't change it's coordinates, or it would be a different point. You're right to imply it's silly that Point is mutable. Why fix it with a subclass? Why not write a proper Point class that's immutable and has no relation to the original?

Let me put it this way. I've never seen the LSP be useful, because it only applies to cases where using other - less vague - guidelines or simple common sense would fix the same issues.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Stephan van Hulst wrote:A point is a point is a point. A point can't change it's coordinates, or it would be a different point. You're right to imply it's silly that Point is mutable. Why fix it with a subclass? Why not write a proper Point class that's immutable and has no relation to the original?

Because I may need to use my ImmutablePoint class in conjunction with some application (a GUI?) that is already using Point, and which I can't change. So now I have to write an
ImmutablePoint(Point)
constructor (and create a lot of new objects) simply because some bright spark decided (or was told) that getClass() is a wonderful way to write "safe" equals() methods, even though it causes it to behave in an undocumented manner in collections involving mixtures of types.
(I'm pretty sure that Point doesn't use getClass(), by the way ).

However, there is an even more insidious problem with using getClass(): The Class now becomes one of the values involved in the equals() method; but how many people actually remember to include its hashcode in the hashCode() method?

Winston
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Winston Gutkowski wrote:
However, there is an even more insidious problem with using getClass(): The Class now becomes one of the values involved in the equals() method; but how many people actually remember to include its hashcode in the hashCode() method?


The class--or some class or interface--is involved with equals() even when you do instanceof. There's no need for it to be part of hashCode in either situation. You'll never get a wrong answer from putting too little into hashCode(), only if you put something into it that isn't in equals(). Adding the class (or the type we check against for instanceof) to hashCode() would not help at all in most cases.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Jeff Verdegan wrote:There's no need for it to be part of hashCode in either situation. You'll never get a wrong answer from putting too little into hashCode(), only if you put something into it that isn't in equals(). Adding the class (or the type we check against for instanceof) to hashCode() would not help at all in most cases.

Dunno. It would kind of depend on the situation. While I agree that you'll never get a wrong answer by putting too little in, you may well get a non-optimal hash.

In the case of a hierarchy that uses getClass(), not putting it in might well cause objects to be put in the same bucket of a hashed collection when they shouldn't be (ie, differing subtypes with the same basic ingredients, or differing only in values that can't be included, such as mutable ones).
In the case of one that uses instanceof - yes, in theory you should, but it's much less likely to affect the outcome, since in many cases equals() would return true anyway. Also, with an instanceof-based hierarchy, I can start out my hashcode calculation with super.hashCode(); with a getClass() one I would still need to include my class's hash, since it is a determinator of equality.

The trouble with all these animal-like examples is that the chances are they wouldn't actually need to override equals() at all.

The fact is that (unless you haven't guessed) I dislike getClass()-based equals() methods, because they must either be (a) documented as using it, or (b) final.

Yet another argument for favouring composition over inheritance perhaps .

Winston
 
wood burning stoves
 
subject: Problem with equals on collection