File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Overall performance of ArrayList vs. HashSet

 
Tariq Ahsan
Ranch Hand
Posts: 116
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have an xml object that contains node with multiple occurences. I am trying come up with a code that would remove such nodes of the xml based on certain conditions. These nodes are already been dumped into an array of Strings. I am capturing the indexes of this array into an ArrayList object which satisfies the conditions for removal from the array. I can put this array of strings into a ArrayList and do a iteration and do remove. Similarly can use HashSet to do the same. Wondering what would perform overall better in removing these elements? Or there is better way to do this in a totally different way?

Thanks

Tariq
 
Norm Radder
Ranch Hand
Posts: 707
3
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wondering what would perform overall better in removing these elements?

One way to compare different ways is to time them and check the differences.
 
Campbell Ritchie
Sheriff
Pie
Posts: 47229
52
  • 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Asking for performance differences between ArrayList and HashSet is like asking which is faster, a motorbike or a speedboat. On water the motorbike is faster and the speedboat is better on the roads . . . or maybe the other way round.

The two classes give completely different functionality and you need to read about the Set and List interfaces, in the API and Java Tutorials. In fact using a HashSet will lose all your duplicates on insertion.
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15150
31
Android IntelliJ IDE Java Scala Spring
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, as Campbell already says, an ArrayList and a HashSet are two different things. An ArrayList is an ordered list of elements, internally stored in an array, and a HashSet is an unordered collection, which does not allow duplicates.

You can't say which of the two is "overall faster". Some operations are faster on an ArrayList and some operations are faster on a HashSet, due to how these datastructures are organized internally. The API documentation gives some information about the performance of different operations.

The documentation for ArrayList says:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

And the documentation of HashSet says:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
 
Tariq Ahsan
Ranch Hand
Posts: 116
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all for your responses to my sort of dumb question. I'll read into more on List and Set to getter understanding on the two.
Appreciated for your help.
 
Campbell Ritchie
Sheriff
Pie
Posts: 47229
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Tariq Ahsan:
Thanks all for your responses to my sort of dumb question.
What was "dumb" about it?

You're welcome.
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A small addition..

The HashSet determines a duplicate by calling the equals method of the objects being added. Unless the equals method of the objects that is being added to the HashSet is properly overridden the HashSet will accept "duplicates".
 
Rob Spoor
Sheriff
Pie
Posts: 20369
44
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But then of course they aren't duplicates anymore, are they

But you're right, the notion of "duplicate" is determined by hashCode + equals for (Linked)HashMap and by compareTo (or the Comparator) for TreeMap.
 
Wirianto Djunaidi
Ranch Hand
Posts: 210
Ruby Ubuntu VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Gamini Sirisena:
A small addition..

The HashSet determines a duplicate by calling the equals method of the objects being added. Unless the equals method of the objects that is being added to the HashSet is properly overridden the HashSet will accept "duplicates".


Considering the name of the class is HashSet, I would think it will rely more on hashCode() method. Of course the common best practice is where a class return true on equals() it should return the same value on hashCode(), so you can kind of say that..but no guarantee
[ September 02, 2008: Message edited by: Wirianto Djunaidi ]
 
Campbell Ritchie
Sheriff
Pie
Posts: 47229
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Wirianto Djunaidi:
Considering the name of the class is HashSet, I would think it will rely more on hashCode() method. Of course the common best practice is where a class return true on equals() it should return the same value on hashCode(), so you can kind of say that..but no guarantee


Look at the API documentation, where it tells you how HashSet works (briefly).

If you are not guaranteeing that you get the same hashCode as other objects returning true on equals, you are programming wrongly. Read about hashCode here and notice it says "must" or "must consistently."
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I knew someone was going to pick on that duplicate..

JSE 1.6 API doc says the following on the add method of the HashSet..

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

This does not talk about the hashCode.

However it is true that when you override equals, hashCode must return same value when equals returns true. (But hash codes need not return different values when equals returns false)
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic