File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes Overall performance of ArrayList vs. HashSet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Overall performance of ArrayList vs. HashSet" Watch "Overall performance of ArrayList vs. HashSet" New topic
Author

Overall performance of ArrayList vs. HashSet

Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Hi,

I have an xml object that contains node with multiple occurences. I am trying come up with a code that would remove such nodes of the xml based on certain conditions. These nodes are already been dumped into an array of Strings. I am capturing the indexes of this array into an ArrayList object which satisfies the conditions for removal from the array. I can put this array of strings into a ArrayList and do a iteration and do remove. Similarly can use HashSet to do the same. Wondering what would perform overall better in removing these elements? Or there is better way to do this in a totally different way?

Thanks

Tariq
Norm Radder
Ranch Hand

Joined: Aug 10, 2005
Posts: 690
    
    1
Wondering what would perform overall better in removing these elements?

One way to compare different ways is to time them and check the differences.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40027
    
  28
Asking for performance differences between ArrayList and HashSet is like asking which is faster, a motorbike or a speedboat. On water the motorbike is faster and the speedboat is better on the roads . . . or maybe the other way round.

The two classes give completely different functionality and you need to read about the Set and List interfaces, in the API and Java Tutorials. In fact using a HashSet will lose all your duplicates on insertion.
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14420
    
  23

First of all, as Campbell already says, an ArrayList and a HashSet are two different things. An ArrayList is an ordered list of elements, internally stored in an array, and a HashSet is an unordered collection, which does not allow duplicates.

You can't say which of the two is "overall faster". Some operations are faster on an ArrayList and some operations are faster on a HashSet, due to how these datastructures are organized internally. The API documentation gives some information about the performance of different operations.

The documentation for ArrayList says:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

And the documentation of HashSet says:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Tariq Ahsan
Ranch Hand

Joined: Nov 03, 2003
Posts: 116
Thanks all for your responses to my sort of dumb question. I'll read into more on List and Set to getter understanding on the two.
Appreciated for your help.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40027
    
  28
Originally posted by Tariq Ahsan:
Thanks all for your responses to my sort of dumb question.
What was "dumb" about it?

You're welcome.
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 378
A small addition..

The HashSet determines a duplicate by calling the equals method of the objects being added. Unless the equals method of the objects that is being added to the HashSet is properly overridden the HashSet will accept "duplicates".
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19781
    
  20

But then of course they aren't duplicates anymore, are they

But you're right, the notion of "duplicate" is determined by hashCode + equals for (Linked)HashMap and by compareTo (or the Comparator) for TreeMap.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Wirianto Djunaidi
Ranch Hand

Joined: Mar 20, 2001
Posts: 210

Originally posted by Gamini Sirisena:
A small addition..

The HashSet determines a duplicate by calling the equals method of the objects being added. Unless the equals method of the objects that is being added to the HashSet is properly overridden the HashSet will accept "duplicates".


Considering the name of the class is HashSet, I would think it will rely more on hashCode() method. Of course the common best practice is where a class return true on equals() it should return the same value on hashCode(), so you can kind of say that..but no guarantee
[ September 02, 2008: Message edited by: Wirianto Djunaidi ]
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40027
    
  28
Originally posted by Wirianto Djunaidi:
Considering the name of the class is HashSet, I would think it will rely more on hashCode() method. Of course the common best practice is where a class return true on equals() it should return the same value on hashCode(), so you can kind of say that..but no guarantee


Look at the API documentation, where it tells you how HashSet works (briefly).

If you are not guaranteeing that you get the same hashCode as other objects returning true on equals, you are programming wrongly. Read about hashCode here and notice it says "must" or "must consistently."
Gamini Sirisena
Ranch Hand

Joined: Aug 05, 2008
Posts: 378
I knew someone was going to pick on that duplicate..

JSE 1.6 API doc says the following on the add method of the HashSet..

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

This does not talk about the hashCode.

However it is true that when you override equals, hashCode must return same value when equals returns true. (But hash codes need not return different values when equals returns false)
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Overall performance of ArrayList vs. HashSet