aspose file tools*
The moose likes Java in General and the fly likes hashCode/equals stratgy for mutable object Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "hashCode/equals stratgy for mutable object" Watch "hashCode/equals stratgy for mutable object" New topic
Author

hashCode/equals stratgy for mutable object

Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

(This maybe only intermediate)

I'm struggling with an optimal approch to implement both equals() and hashCode() for an object that has a lot of mutable fields. Say a human/person record. You have to be able to change the name parts, age, sex, etc. but when you do that, it really breaks HashMaps, etc. because the hashCode changes.

Consider an example:


You want the equals() code to look at the values of fname, lname, etc. But its also obvious that in any non-trivial application, you have to let people change their name, etc.

What's a good philosohpy for this?
pete stein
Bartender

Joined: Feb 23, 2007
Posts: 1561
Perhaps you should only use the invariants: the ID, the birth date, and perhaps the name at birth.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

pete stein wrote:Perhaps you should only use the invariants: the ID, the birth date, and perhaps the name at birth.


I can't think of any invariants. People make typos, fields get changed. My daughter's driver's license said she was a boy for years, they typo'd it when she first got the license, and it took a lot of paperwork to change. (altho one look at her at age 18 gave really strong hints that she was not a boy).

While the ID is the primary key in the database, its a lousy field for telling if humans are equal. Perhaps two otherwise identical records are two people, perhaps they are one person entered twice into the system, and our duplication check code failed. Its a hard problem.

pete stein
Bartender

Joined: Feb 23, 2007
Posts: 1561
You will never get a perfect solution to any problem, just a good enough solution and this will have to do. You may not be happy with this, but there it is.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
You could require than any time one of those fields is changed, the object must first be removed from any maps or sets it's in. Remove the original object, make the change, and reinsert. This can work if there's a fixed list of these maps and sets, firmly under your control. Otherwise, could get messy.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Mike Simmons wrote:You could require than any time one of those fields is changed, the object must first be removed from any maps or sets it's in. ....could get messy.


I have thought along these lines, and its messy. How does an object know how many maps, hashsets, lists, etc that its in? Its really unknowable, which is harder than NP-hard.

If you had an object in a HashMap and in an ArrayList or vector, you would have to remove and replace it in both. Very ugly. It would lead to object listeners, and zillions of callbacks.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Pat wrote:How does an object know how many maps, hashsets, lists, etc that its in? Its really unknowable, which is harder than NP-hard.

You could make it the responsibility of the client to register any maps or sets that the instance partakes in. This could be at the class level or instance level. E.g. at the class level: class Foo could maintain a static collection of maps, and another of sets. Clients of Foo who want to use Foo instances in a map (as key) or set would need to call Foo.registerFooMap(map) or Foo.registerFooSet(set), which would add the map or set to the appropriate collection. Then all mutator methods of Foo would interate the two collections and remove the current Foo instance from all maps and sets, perform the mutation, and then re-insert the instance in the maps and sets. For maps, it would also need to remember and restore all the values associated with a given key.

Here I'm assuming that there are maps and sets responsible for tracking all the Foos in a given JVM. You can do this stuff at an individual object level instead, though it does lead to an icky proliferation of objects as each Foo would contain lists of maps and sets.

This scheme doesn't really appeal to me that much, as it's very easy for a client to simply fail to register. But that's their problem, really. Anyway a scheme like this may still be better than nothing.

Another option: make Foo immutable, except for one field. All "mutators" should instead create a new instance, like we have for String. The one exception is that Foo also has a new mutable boolean, isValid. This flag is true when a Foo is created, but as soon as any "mutator" is called, it's set to false. Anyone still holding a reference to the old Foo can discover that it's no longer valid. You could even have equals() and hashCode() throw an exception if isValid is false. So if anyone is still using an old Foo instance in a map or set, they can get an exception complaining about the problem. It's still the client's responsibility to update maps and sets as necessary - the isValid check just makes it easier to detect if some client fails in this responsibility.

Pat wrote:If you had an object in a HashMap and in an ArrayList or vector, you would have to remove and replace it in both. Very ugly. It would lead to object listeners, and zillions of callbacks.

I don't think there's any reason to worry about ArrayLists or Vectors - these don't depend on hashCode() at all. And when they do depend on equals(), e.g. for contains(), they always use the current data for comparison. There's no caching of some old data the way there is in a hash table or hash set, where the bucket position is based on the original hash code, and thus may be out of date.

Other than that, yes, I agree, it can get ugly. Still, anyone who's got out-of-date data as a map key or in a set, that's a problem, and it needs fixing. If an ugly solution is the only one you have (after waiting for other, better ideas, which would be nice if they exist), you may just have to use an ugly solution.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Mike Simmons wrote:I don't think there's any reason to worry about ArrayLists or Vectors - these don't depend on hashCode() at all. And when they do depend on equals(), e.g. for contains(), they always use the current data for comparison. There's no caching of some old data the way there is in a hash table or hash set, where the bucket position is based on the original hash code, and thus may be out of date.


The reason you must pay attention is that if you use your "looks immutable" approach, then any call to any set* will create a new Foo object. So if there are old flavors of the Foo in an ArrayList/Vector, or anywhere, when you change it, you either chase down all the locations, or have two (or more) objects that claim to be the one true Foo instance, but are in fact different.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Pat wrote:So if there are old flavors of the Foo in an ArrayList/Vector, or anywhere, when you change it, you either chase down all the locations, or have two (or more) objects that claim to be the one true Foo instance, but are in fact different.

No - you'd have one or more instances where isValid is false, and you'd have one instance where isValid is true. That's part of the point of adding the isValid field - validate it any time the object's data is accessed. And if you're going to worry about this sort of thing, you need to also consider arrays, as well as every other possible reference that might be held somewhere. I imagine here it would be useful to have a way to look up the current, valid version of an object, using an invalid reference. But the reason I'm specially concerned with maps and sets is that those are the ones that can be completely hosed by changing the equals()/hashCode() of the keys/members. With a Vector (blech) or ArrayList, at least you will still find the object, exactly where you put it. And you could use the object's DB id to load more current data. With a HashMap or HashSet though, you may never see the object again. Unless you iterate all entries, maybe - but that's usually inefficient unless you really need to do something with every element.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Mike Simmons wrote:
Pat wrote:So if there are old flavors of the Foo in an ArrayList/Vector, or anywhere, when you change it, you either chase down all the locations, or have two (or more) objects that claim to be the one true Foo instance, but are in fact different.

No - you'd have one or more instances where isValid is false, and you'd have one instance where isValid is true.


I'm not seeing any practical value to your approach.isValid or not, you have a bunch of objects that are not correct and you don't know how many of them there are.

Not a winning approach.

Any other ideas out there?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Pat wrote:you have a bunch of objects that are not correct and you don't know how many of them there are.

And why would it be important to know how many there are? Are you dealing with memory leaks too? Or just looking for a silver bullet to fix all possible problems? (Note to Pete Stein: yeah, I probably should have left it there too.)

I did just suggest checking isValid any time the data is accessed. The idea was, if an object is not valid, and you're accessing its data anyway, throw an exception, or at least log an error. With the first, you can catch this exception and then load a correct version. Of not catch it, and at least you've got a useful clue in your log file about a probable bug. I can at least see possible value to that. Whether it fixes the particular problem you're concerned with, I dunno.

Having said that, I too would be interested in hearing other suggestions, as this seems to be a hard problem.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Mike Simmons wrote:And why would it be important to know how many there are? Are you dealing with memory leaks too?


If there are X > 0 bad Foo objects, it really doesn't matter how many there are. I'm not as worried about memory leaks as logical leaks.

I'm terrified that some programmer will come along a year from now and not see the subtle problems.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Mike Simmons wrote:
Pat wrote:you have a bunch of objects that are not correct and you don't know how many of them there are.
And why would it be important to know how many there are? Are you dealing with memory leaks too?

If there are X > 0 bad Foo objects, it really doesn't matter how many there are. I'm not as worried about memory leaks as logical leaks.

OK. Your part about "and you don't know how many of them there are" was irrelevant line noise. Got it.

So, did you read anything else in my reply?
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4646
    
    5

Mike Simmons wrote:So, did you read anything else in my reply?

yes
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3011
    
  10
Guess there's no point in continuing then.
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Hey cool down guys Interesting (hard) problems always create different opinions and let us respect each other's views!

Pat Farrell wrote: While the ID is the primary key in the database, its a lousy field for telling if humans are equal. Perhaps two otherwise identical records are two people, perhaps they are one person entered twice into the system, and our duplication check code failed. Its a hard problem.


I feel there are two problems here.

  • Programmatic problem of how to make the equals and hashcode method not change when the field values change.
  • Ideological problem of not having the human_id as the only field for hashcode and equals.


  • I feel that since the human id is the primary key in the DB then the ideological limits have somewhere been breached so it probably makes sense to breach them everywhere
    This automatically solves the programmatic problem as human_id will never change. Am i correct?

    Moving to Intermediate forum.


    apigee, a better way to API!
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4646
        
        5

    Nitesh Kant wrote: Interesting (hard) problems always create different opinions and let us respect each other's views!

    Moving to Intermediate forum.


    Well, you can move it, but as I posted long ago, there are very few threads that stay in Advanced, and once in a while, it would be good to see a hard problem stay there.

    I feel that since the human id is the primary key in the DB then the ideological limits have somewhere been breached so it probably makes sense to breach them everywhere
    This automatically solves the programmatic problem as human_id will never change. Am i correct?


    I don't agree that a unique key in the database, which is by definition a primary key to the table, is very useful in the application domain. Yet, this is a philosophical judgment, or theological, whatever. But I see it as accidental to the domain problem of when are Human objects equal. And without getting into DNA or arguing over identical twins, two people with the name "Nitesh Kant" are not equal just because some number of fields are equal.

    I'm not trying to impune @Mike Simmons's view or judgement, but he's not going to convince me, so I'm looking for other ways to have objects that are not technically immutable that you can still handle well.

    The record from the database represents the values/attributes of the Human at one point in time. You have to be able to change it, people get married, typos get fixed. So this cast in stone, one-to-one relationship with the Human object and the record in the RDBMS table is not viable.

    In web applications, it seems that half the code and user time is dealing with the C.R.U.D functions to change names and addresses.
    Nitesh Kant
    Bartender

    Joined: Feb 25, 2007
    Posts: 1638

    Pat Farrell wrote:
    Nitesh Kant wrote: Interesting (hard) problems always create different opinions and let us respect each other's views!

    Moving to Intermediate forum.


    Well, you can move it, but as I posted long ago, there are very few threads that stay in Advanced, and once in a while, it would be good to see a hard problem stay there.


    I feel the complexity of *this* problem is more due to ideology than technology, so i moved it to the Intermediate forum.
    Martijn Verburg
    author
    Bartender

    Joined: Jun 24, 2003
    Posts: 3274
        
        5

    If I couldn't decide which fields were regularly changeable I would personally just use every field <shrug>. My argument being "for an object to be 'truly' equal it's members must be equal". It's extra programming work for sure, but it sounds like it would give you the peace of mind needed in this case.

    As an aside although we don't have hard and fast rules about what type of Q's remain in Advanced, I think many moderators see questions like "I'm hacking the JVM and X happened...." belonging there, of course YMMV .


    Cheers, Martijn - Blog,
    Twitter, PCGen, Ikasan, My The Well-Grounded Java Developer book!,
    My start-up.
    Steve Luke
    Bartender

    Joined: Jan 28, 2003
    Posts: 4179
        
      21

    Just to throw my 2 cents into this discussion:

    The human_id may not be a good description for a human, but it is a good description of a unique point of data. You can imagine using a Value Object type of approach where the contents of the instance is only the immutable human_id, while accessor methods to other data point to a back-end/cacheable/updatable Object that never leaves the data layer. Something like this:


    Then the data-layer factory keeps the Human data up to data, takes it out of service when needed to, etc... and your business layer has a consistent view of the current data with a simple interface and a basic mechanism for maintaining equality in respects to data.

    You had brought up concerns previously about using the human_id as a good marker for equality because it does not sort out duplicate data points that passed by your data entry duplication checks. This is a problem to be addressed on your data entry side and not on your data access side. Like you said, there is little or no way to make sure that 'Steve Luke' is the same as 'Steve Luke' on the data access side, we could be quite different people (or not). But if we have the same human_id, we can be expected to be the same person. On the other hand, if I already exist in the database, then at the data entry level you can spit back a message 'This guy Steve Luke already exists, do you want to make a new one, or update the old one?' rather easily.


    Steve
    Martijn Verburg
    author
    Bartender

    Joined: Jun 24, 2003
    Posts: 3274
        
        5

    That's a really good point I think you have there Steve, wish I'd summed it up like that
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4646
        
        5

    Steve Luke wrote:Then the data-layer factory keeps the Human data up to data, takes it out of service when needed to, etc... and your business layer has a consistent view of the current data with a simple interface and a basic mechanism for maintaining equality in respects to data.

    You had brought up concerns previously about using the human_id as a good marker for equality because it does not sort out duplicate data points that passed by your data entry duplication checks. This is a problem to be addressed on your data entry side and not on your data access side. Like you said, there is little or no way to make sure that 'Steve Luke' is the same as 'Steve Luke' on the data access side, we could be quite different people (or not). But if we have the same human_id, we can be expected to be the same person. On the other hand, if I already exist in the database, then at the data entry level you can spit back a message 'This guy Steve Luke already exists, do you want to make a new one, or update the old one?' rather easily.


    Interesting, thanks.

    Shouldn't equal and hashCode also include the HumanFactory instance? How do you know that there is only one HumanFactory? If you are relying on one and only one, then you have a hidden, and fairly critical, dependency that can break the meaning of equals() and hashCode().

    Years ago I worked on a system for the Veterans Administration. You would not believe how many duplicates they had that were really different people.
    Steve Luke
    Bartender

    Joined: Jan 28, 2003
    Posts: 4179
        
      21

    That depends on the system. If your system has a single source of data then it would not be necessary to either have a single factory or use the factory as part of the equality test. It would probably be more efficient to have just one factory but not necessary I don't think.

    If on the other hand, your data comes in from different sources, and each source may have duplicate primary keys that represent different data points, then using the factory as part of equality would be a good idea.
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18541
        
        8

    Unless I missed something (I just joined this thread), I would simply assume that no two of your Thing objects could ever be equal. At least, I didn't see anything where you had a definition of "equal" based on the state of the object. That being the case, I just wouldn't override equals() or hashCode() at all. Let Object deal with those things in the usual way.
    Alecsandru Cocarla
    Ranch Hand

    Joined: Feb 29, 2008
    Posts: 158
    1. Mutable objects should not be used as keys in HashMaps. I didn't see anywhere mutable objects used as keys (I mean, not in good quality code). Usually, keys are simple immutable objects like Strings or Integers. This being said - the hashCode() can be implemented any way you want. (The hashCode is not used for values in HashMaps, only for keys - so there's no restriction about keeping your objects as values in maps)

    2. equals() - it depends what you're trying to do.
    a. If you have a solid id (a database primary key, for example), than that's what you should use in your equals().
    b. Otherwise, your objects don't have identity and, as such, you can use any combination of fields which is helpful (all of them, if all are important).

    Of course, you should also take care not to break the well-known contract between hashCode() and equals().

    SCJP 1.4 100%
    SCJD 99.5%
     
     
    subject: hashCode/equals stratgy for mutable object