• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

why actually does hashing help improve the performance of collections?

 
Ranch Hand
Posts: 2925
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have read that hashing is used to improve performance of collections such as hashset, hashmap etc.

But why does it improve performance. I read that this technique involves Keeping elements with same hashcode in a hash bucket. But how does this improve performance.

thanks
 
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think an example would help here. Lets say you are storing the name of all employee in your organization.
e.g. Alan, Matt, Mark, Sam, Sunil

If you store them normally without hashing, searching for a single employee would mean looking at all 5 names (you could use some searching technique, but still need to look at 5 elements)

Now lets use a very simple hash called "First Letter"

A - Alan
M - Matt, Mark
S - Sam, Sunil

Searching for employee starting with 'S' would mean find the hash bucket and then looking only within that hash bucket which improves performance.

A hash function should
a) Distribute the data-set into a smaller set of hash buckets
-- so a hash function that distributes 100 elements in 1 bucket or in 100 buckets is bad. If it distributes it in 10 buckets is good
b) Each 'bucket' should be uniform in size
-- so in hash function above if one bucket contains 91 elements and other buckets contain 1 element each it is bad. If each bucket contains 10 elements it is good

Note: Hash called "First Letter" is not really very good, since we would have more name starting with some letters like 'R' or 'S' v/s names with say 'X' 'Y' or 'Z'
 
Greenhorn
Posts: 9
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Think of some other collections you may have used (Array, ArrayList, Linked List, etc.), and how the data was stored; one after another. The way to find the data you were looking for is simply traverse through the structure until you found it, which may take a long time with very large amounts of data.

Hashing the data will assign data a specific address (or bucket). Now to find the data, you can look directly to that address- and there it is. Occasionally, multiple data items will be stored in the same address (known as a collision) and a resolution must be implemented.

A good way to think of the advantage: Trying to find a friend. Using a collection like an Array, you would need to stop at every house in town looking for this friend, until you arrived at his house. However, with Hashing you're given his address. You can find your friend's house directly. A good way to think of the buckets is as apartment complexes. You've got the address, but now you need search each apartment individually.
 
Monica Shiralkar
Ranch Hand
Posts: 2925
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks all

A good way to think of the advantage: Trying to find a friend. Using a collection like an Array, you would need to stop at every house in town looking for this friend, until you arrived at his house. However, with Hashing you're given his address. You can find your friend's house directly. A good way to think of the buckets is as apartment complexes. You've got the address, but now you need search each apartment individually.



Does this not mean that a further classification would have further increased the performance. Why only classisification of Apartment,why now State then City then Apartment.
 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica. Shiralkar wrote:
Does this not mean that a further classification would have further increased the performance. Why only classisification of Apartment,why now State then City then Apartment.



It is only an analogy used to explain why it is faster in searching for your friend. If you are going to take the analogy literally, it is going to break down.... To continue using the analogy (and completely destroy it), the hashing also controls the world that your friends lives in -- as the population grows the number of houses grows; You can get to each house via teleportation (ie. meaning there is no concept of closer houses and farther houses). And it is, if not done right, is not that accurate -- sometimes, the address given is for a house already occupied, in which case, the people has to share the house.

In this inaccurate condition, yes, perhaps more classification would be better, but remember, the original classification is supposed to work to the point where it is not needed -- no house sharing.

Henry
 
Monica Shiralkar
Ranch Hand
Posts: 2925
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic