aspose file tools*
The moose likes Performance and the fly likes Cross Reference two ArrayList Duplicates Removal Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Cross Reference two ArrayList Duplicates Removal" Watch "Cross Reference two ArrayList Duplicates Removal" New topic
Author

Cross Reference two ArrayList Duplicates Removal

Michael Labuschagne
Ranch Hand

Joined: May 08, 2007
Posts: 56

You have two ArrayLists containing objects of the same type in both... You want to process the two lists and remove duplicates (i.e. where an object is deemed to be the same as another object in the other ArrayList)... The objects in the individual ArrayLists do not contain duplicates and are sorted according to primary key (which is one of the test fields to determine two objects are the same - i.e. an object from list 1 with primary key 'PK1' could have a 'sister' element list 2 which also has primary key 'PK1' but due to other member values having different values would be deemed to not be equal) the objects have a method isIdentical(object) which determines whether the two are identical.

So far I have come up with a crude and inefficient method of removing duplicates:



This has many, many iterations through the array structures is there not a better method? Any suggestions are welcome!
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2376
    
  28

Apache commons collections has a utility class called CollectionUtils that can do that for you
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4658
    
    5

I'd use Guava's colletions, specifically put the values in a set, and use the difference or symmetric difference methods. If its fast enough for Google, I'm good.

Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7795
    
  21

Michael Labuschagne wrote:You have two ArrayLists containing objects of the same type in both... You want to process the two lists and remove duplicates (i.e. where an object is deemed to be the same as another object in the other ArrayList)... The objects in the individual ArrayLists do not contain duplicates and are sorted according to primary key (which is one of the test fields to determine two objects are the same - i.e. an object from list 1 with primary key 'PK1' could have a 'sister' element list 2 which also has primary key 'PK1'...

From your description, I suspect you could use a 'stepping' comparison of the two lists based on matching PKs. However, if it was me, I think I'd just load one of the lists (probably the larger one) into a HashSet and run contains().

However, in order for that to work, you'll have to make your equals() method do what isIdentical()(*) does currently, and also add a decent hashCode() method.

Winston

(*)Edit: BTW, that's not a great name for a method in my opinion. To me, "identical" means "has the same reference" (ie, obj1 == obj2).


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Cross Reference two ArrayList Duplicates Removal