• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Set v.s. List

 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would like to find out which one I should use for the optimal space and computation complexities. So far, ArrayList is what I used to store objects. And there are few properties about the implementations I listed below to make my question more concrete.

- Each object is unique and implements Comparable. In this object, there are two fields that are String. So the function compareTo() is basically implemented based on those two String fields.

- A new object is added into a proper index in the ArrayList in order to maintain the list's sorted order. The proper index is retrieved by using Collections.binarySearch

- Arraylist is sorted based on the two private fields of the object.

I can see that a TreeSet can support these features just as easily. So it comes down to if there is an advantage of using one over the other based on memory and speed.

Thanks.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As far as I can tell, a TreeSet will consume a little bit more memory and and be faster. It's unlikely that you will see much of a difference for a small number of elements (say <10000), though.

What's more important is that using a TreeSet is simpler: You just insert the element and you don't need to care about the sorting at all.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seconding Ilja. I would add that for me, the most important reason to use a SortedSet is that it is much more expressive. Good code is a clear, precise and concise expression of programmer intent. This certainly includes the data structures you pick. If you have a collection of objects and you have a strong expectation of them being unique, you can express this expectation by using a Set. If you want to keep it sorted at all times, you should express this by using a SortedSet.

The funny thing is that making your code expressive usually leads to better performance as well. Not always - there are circumstances where you're better off putting things in a List and sorting afterwards - but compromising clarity is something you do only as a last resort. Remember Hoare's dictum: premature optimisation is the root of all evil in programming.

- Peter
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Peter, very well said!
 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all for the feedback, especially big ups to Ilja. This is a following question: why will a TreeSet consume a little bit more memory? And why is it faster?? Or can you show me your way to come to these conclusions.

Thanks again!
[ August 16, 2004: Message edited by: John Cage ]
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by John Cage:
Thanks all for the feedback, especially big ups to Ilja. This is a following question: why will a TreeSet consume a little bit more memory?
Because it uses a tree data structure to store its information rather than an array. Each node in the tree is an object (TreeMap.Entry) which keeps pointers to its parent, the left branch, and the right branch, the element, and more. An ArrayList is backed by a simple array with the elements.
And why is it faster??
Because to insert an element in an arbitrary position of an ArrayList, on average half the list will have to be shifted by one position. This takes O(N) time; i.e., if your list contains 1,000,000 elements it will take 1,000 times as long as when your list contains 1,000 elements. A TreeSet on the other hand only needs to traverse the depth of the tree and therefore takes O(log(N)) time, i.e., if your set contains 1,000,000 elements it will take only twice as long as when your list contains 1,000 elements.

- Peter
 
John Cage
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Peter. Great analysis.

Just an observation: I tried with TreeSet and ArrayList. When using ArrayList, the program produces OutOfMemoryError. But no error with TreeSet. Hmm...

Thanks again.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by John Cage:
Thanks Peter. Great analysis.


Seconded.

One thing to add, though: Notice that the Big O notation doesn't say anything about absolute performance. So based on it we can't say how the two alternatives behave for small n's. It only says that the more elements we add, the faster TreeSet becomes relative to ArrayList. It might well be that ArrayList is faster for a small amount of elements!


Just an observation: I tried with TreeSet and ArrayList. When using ArrayList, the program produces OutOfMemoryError. But no error with TreeSet. Hmm...


This teaches us another heuristic: don't search for the fastest solution, but for the simplest - it will almost invariably have less bugs...

Thanks again.[/QB]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic