• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Has anyone bothered to develop rules of thumb for HashSet vs TreeSet

 
Rancher
Posts: 4803
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I've just listened to a nice MIT lecture on algorithms, where they describe Red/Black balanced trees. This is the algorithm that the Sun Javadocs say is the implementation for the TreeSet/TreeMap code. It nice because its Big-Oh( ( ln N).

I generally use HashSet/HashMap, which is obviously Big-Oh( 1 ) but that ignores the constants. I expect that the constants are not tiny for most implementations of hashCode() and the management.

Most of the time (maybe nearly always) there is going to be no practical impact.

But my curiosity is raised. Has anyone looked at this enough to come up with some rules of thumb when its better to use TreeMap rather that HashMap?
 
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Unless my wits are scrambled (again), a HashMap offers non-collated data. A TreeMap is an extension of that paradigm that permits element access in a fixed and pre-defined order. So the rule is that if you intend to iterate over a Hash and you want a specific sequence, use the TreeMap. If any old order will do, save the overhead that ordering the iteration results incurs and use the HashMap.
 
Pat Farrell
Rancher
Posts: 4803
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, they define different access methods. And the TreeMap requires that you implement Comparable for the objects.

But I was asking in the abstract. Its clear that calculating and manipulating a Hash takes time. For example, on a String, it looks at every character. The Compare operation can look at as few as one character. So I was wondering for what numbers N does O(1) > O(ln N)
 
Tim Holloway
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Although this may not be true for all JVMs, I think the hash value may be a builtin part of the String object in the Sun JVM. It's immutable, so if the storage-versus-CPU tradeoff is worth it, that's practical. In any event, I've never heard anyone complain about the performance. Since the JVM does things to optimize how it handles - and replicates - strings, the cost may very well be paid back long before you started working with collections.

A TreeMap is built on top of a HashMap, so it has the overhead of a HashMap plus the overhead of the ordering mechanism. A TreeSet according to the JavaDocs is built on top of a Tree[b]Map[/i], so it would likewise carry HashMap overhead.

Since collections are so ubiquitous and so criticial, considerable work has been done to make them efficient by Java's designers. Although as a general rule, you'll do best using the simplest mechanism required for the task, the ultimate test is how it performs under your actual production load. In programming, as in philosphy/politics, making major decisions on what "should" be true instead of determining what actually is true can lead to unfortunate results.
 
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:A TreeMap is built on top of a HashMap, so it has the overhead of a HashMap plus the overhead of the ordering mechanism.


As Pat said, TreeMap is implemented using a Red-black tree. It doesn't have any relation to HashMap, except that they both extend AbstractMap.

Note that the Sun implementation of String caches the hash code the first time it is calculated, so after the first time, the call to hashCode() takes constant time.
 
Tim Holloway
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmmm. Time to see the optometrist, it seems. What I read in the JavaDocs last time and what I read this time aren't the same. But this time, what I read concurs with what you said.

Still, it's usually going to be less overhead to calculate a hash than to balance a tree (YMMV).
 
Pat Farrell
Rancher
Posts: 4803
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Still, it's usually going to be less overhead to calculate a hash than to balance a tree (YMMV).



There is more to a hash structure than just calculating the hashCode, you have to handle confilct/collision, resize the map and perhaps change the mapping function as the hash set fills.

The beauty of a red/black tree is that its always balanced, always O (ln N) and fixups are really fast, localized and easy. Its really a very clever structure.

I started out with a bias similar to Tim's, but as I learn more about Red/Black trees, the more impressed I am.
 
Tim Holloway
Saloon Keeper
Posts: 27763
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pat Farrell wrote:

Tim Holloway wrote:Still, it's usually going to be less overhead to calculate a hash than to balance a tree (YMMV).



There is more to a hash structure than just calculating the hashCode, you have to handle confilct/collision, resize the map and perhaps change the mapping function as the hash set fills.

The beauty of a red/black tree is that its always balanced, always O (ln N) and fixups are really fast, localized and easy. Its really a very clever structure.

I started out with a bias similar to Tim's, but as I learn more about Red/Black trees, the more impressed I am.



The right tool for the job. In actuality, most Hashmap usage doesn't get that sophisticated. The hashtable size is generally fixed, as is the hash algorithm. When the hash value is cached with the object, the overhead becomes minimal, assuming the object gets referenced multiple times. Hash collisions are generally managed via singly-linked lists, which can be traversed very rapidly, since often there's no more than 3 or 4 elements in even the longest overflow list and don't require a whole lot of logic to maintain. For absolute maximum access, you can calculate a "perfect hash", which is simply the art of designing the hash to eliminate collisions. There are a number of good articles on how to do this.

This isn't to say that I always use hashes and never use balanced trees. Hashes are often a lot less memory-efficient, for one thing. I've seen a number of databases where the selection of hash versus tree was selectable on a per-index basis.

There's no "best" answer, so do your homework, measure performance, and above all, stay flexible.

And, if it's not a performance-critical part of the system, don't agonize too much over whether you made the "right" choice. Software offers enough agony without making up any more.
 
reply
    Bookmark Topic Watch Topic
  • New Topic