aspose file tools*
The moose likes Performance and the fly likes Set v.s. List Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "Set v.s. List" Watch "Set v.s. List" New topic
Author

Set v.s. List

John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
I would like to find out which one I should use for the optimal space and computation complexities. So far, ArrayList is what I used to store objects. And there are few properties about the implementations I listed below to make my question more concrete.

- Each object is unique and implements Comparable. In this object, there are two fields that are String. So the function compareTo() is basically implemented based on those two String fields.

- A new object is added into a proper index in the ArrayList in order to maintain the list's sorted order. The proper index is retrieved by using Collections.binarySearch

- Arraylist is sorted based on the two private fields of the object.

I can see that a TreeSet can support these features just as easily. So it comes down to if there is an advantage of using one over the other based on memory and speed.

Thanks.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
As far as I can tell, a TreeSet will consume a little bit more memory and and be faster. It's unlikely that you will see much of a difference for a small number of elements (say <10000), though.

What's more important is that using a TreeSet is simpler: You just insert the element and you don't need to care about the sorting at all.


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Seconding Ilja. I would add that for me, the most important reason to use a SortedSet is that it is much more expressive. Good code is a clear, precise and concise expression of programmer intent. This certainly includes the data structures you pick. If you have a collection of objects and you have a strong expectation of them being unique, you can express this expectation by using a Set. If you want to keep it sorted at all times, you should express this by using a SortedSet.

The funny thing is that making your code expressive usually leads to better performance as well. Not always - there are circumstances where you're better off putting things in a List and sorting afterwards - but compromising clarity is something you do only as a last resort. Remember Hoare's dictum: premature optimisation is the root of all evil in programming.

- Peter
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Peter, very well said!
John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Thanks all for the feedback, especially big ups to Ilja. This is a following question: why will a TreeSet consume a little bit more memory? And why is it faster?? Or can you show me your way to come to these conclusions.

Thanks again!
[ August 16, 2004: Message edited by: John Cage ]
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Originally posted by John Cage:
Thanks all for the feedback, especially big ups to Ilja. This is a following question: why will a TreeSet consume a little bit more memory?
Because it uses a tree data structure to store its information rather than an array. Each node in the tree is an object (TreeMap.Entry) which keeps pointers to its parent, the left branch, and the right branch, the element, and more. An ArrayList is backed by a simple array with the elements.
And why is it faster??
Because to insert an element in an arbitrary position of an ArrayList, on average half the list will have to be shifted by one position. This takes O(N) time; i.e., if your list contains 1,000,000 elements it will take 1,000 times as long as when your list contains 1,000 elements. A TreeSet on the other hand only needs to traverse the depth of the tree and therefore takes O(log(N)) time, i.e., if your set contains 1,000,000 elements it will take only twice as long as when your list contains 1,000 elements.

- Peter
John Cage
Greenhorn

Joined: Aug 13, 2004
Posts: 8
Thanks Peter. Great analysis.

Just an observation: I tried with TreeSet and ArrayList. When using ArrayList, the program produces OutOfMemoryError. But no error with TreeSet. Hmm...

Thanks again.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by John Cage:
Thanks Peter. Great analysis.


Seconded.

One thing to add, though: Notice that the Big O notation doesn't say anything about absolute performance. So based on it we can't say how the two alternatives behave for small n's. It only says that the more elements we add, the faster TreeSet becomes relative to ArrayList. It might well be that ArrayList is faster for a small amount of elements!


Just an observation: I tried with TreeSet and ArrayList. When using ArrayList, the program produces OutOfMemoryError. But no error with TreeSet. Hmm...


This teaches us another heuristic: don't search for the fastest solution, but for the simplest - it will almost invariably have less bugs...

Thanks again.[/QB]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Set v.s. List