aspose file tools*
The moose likes Java in General and the fly likes HashSet vs Collections.binarySearch vs TreeSet Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "HashSet vs Collections.binarySearch vs TreeSet" Watch "HashSet vs Collections.binarySearch vs TreeSet" New topic
Author

HashSet vs Collections.binarySearch vs TreeSet

Praveen Kumar Singh
Ranch Hand

Joined: Mar 04, 2009
Posts: 43
Hi,
I have requirement to search a object from list 10-10000 objects, all will be unique and chances of just simple iteration is also almost similar (Consider a phone book case: chance of search ~ chance of iteration )
I am confuse here, which one is better way.

HashSet or Collections.binarySearch or TreeSet

After few hour of googling, i come to know that HashSet gives you O(1) search complexity whereas Collections.binarySearch O(log n).
First of all, i am not getting how O(1) is possible.
Fine, you will get the hashbucket with O(1) complexity, but what after that?
You have to do a liner search in the bucket, where not only your objects but rest of the application's and even your application's objects are present.
and definitely, we have three way to search in Java, so we must have some pros and cons!



Praveen
SCJP, SCWCD, SOA
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19723
    
  20

A HashSet gives you O(1) for adding, removing and checking for presence. For actually finding objects you're right that that's an O(n) operation. If you need to search it a lot then HashSet is probably not the best way. Collections.binarySearch also doesn't look like a good candidate because it requires a sorted List. The sorting itself is O(n log n).

But the question here is - what are you searching on? If it's always the same (unique and immutable) field, then why not use a Map with this field as the keys? A HashMap is O(1) for lookups, so you get good performance.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
D. Ogranos
Ranch Hand

Joined: Feb 02, 2009
Posts: 214
The right choice of Collection depends on your requirements.

If you just need to test for existence of an item, then the optimal choice would be HashSet, as it offers constant time for this. I do not think TreeSet would offer you any advantage here, unless you need to get only the first or last elements.

If you actually need to retrieve the item, then you have to use an ordered List, and use Collections.binarySearch to get an index to the item.

Of course, if the items have an identifying characteristic (a key), then you might also use a (Hash)Map to store them, offering both constant search and retrieval times.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Praveen Kumar Singh wrote:You have to do a liner search in the bucket, where not only your objects but rest of the application's and even your application's objects are present.

This seems a little bit messed up to me. A hashset contains only the items you've put in it, so the bucket does not contain "the rest of the application's objects" unless you've explicitly put "the rest of the application's objects" into the hashset. Instances of a HashSet do not share some common memory structure.

and definitely, we have three way to search in Java, so we must have some pros and cons!

Hashing is great if you always access your data using the key and don't mind the items are not sorted when iterating, nor that their order might even change.
TreeSet or TreeMap would be useful if you need to keep your data sorted and can cope with log(n) complexity (which is actually pretty good for most applications).
Lists and binary search are useful when you need to access the collection by numerical index, and occasionally need to find items using the key.

And nothing prevents you from creating a new collection class that would keep its data both in a hashset and a treeset, providing fast lookups and sorted iterations at the same time. Or a class like this. The possibilities are unlimited.
Praveen Kumar Singh
Ranch Hand

Joined: Mar 04, 2009
Posts: 43
@Martin: Hi Martin, thanks for reply, i need to check the documentation to know properly is hash bucket Map's object wise or OS/JVM wise.

@ALL: It seems like everyone is convince that HasMap or Hashing mechanism give the faster result, but as part of my question another requirement was iteration too.
Do any one feel or Know that getting all elements from all buckets and making them a list i.e. entrySet, could be performance hit
In my application, no of time you do searching is almost equivalent to iteration (please check my phone book example)
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Praveen Kumar Singh wrote:@Martin: Hi Martin, thanks for reply, i need to check the documentation to know properly is it hash bucket is Map object wise or OS/JVM wise.


It is absolutely, 100% per HashMap or HashSet instance. There is no sharing across the JVM or OS. That would be a horrible way to implement it, and probably couldn't even be made to work correctly.

A HashMap or HashSet is just another Java class, no different than one you could write yourself. The fact that it's in the core API does not make it special. Each HashMap and HashSet instance maintains its own buckets, with member variables, which are distinct per instance just like any other non-static member variable.

@ALL: It seems like everyone is convince that HasSet or Hashing mechanism give the faster result, but as part of my question another requirement was iteration too.


Iteration is O(N) for all collections, if you use Iterator, whether explicitly or implicitly via foreach loop. If you do something dumb like iterate a List using get(i), then for a LinkedList it will be O(N^2).



In my application, no of time you do searching is almost equivalent to iteration (please check my phone book example)


Eh? One doesn't typically iterate through a phone book. Unless you have some unusual requirement, the most natural model for a phone book would be a HashMap (although the possibility of multiple people with the same name complicates it a little bit).
Praveen Kumar Singh
Ranch Hand

Joined: Mar 04, 2009
Posts: 43
Jeff Verdegan wrote:It is absolutely, 100% per HashMap or HashSet instance. There is no sharing across the JVM or OS.

AND
Iteration is O(N) for all collections,


Thanks Jeff, it was helpful.


Jeff Verdegan wrote:Eh? One doesn't typically iterate through a phone book.

Not convince jeff!!...you, specially in touchscreen phone, people iterate it on phonebook using right side scrollbar, also, i need to support the use case where user can enter in phonebook and press the down button one by one to SCROLL the contacts, try in your phone, it will work :-)



 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HashSet vs Collections.binarySearch vs TreeSet