This week's book giveaway is in the OCPJP forum.
We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line!
See this thread for details.
The moose likes Java in General and the fly likes why actually does hashing help improve the performance of collections? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "why actually does hashing help improve the performance of collections?" Watch "why actually does hashing help improve the performance of collections?" New topic
Author

why actually does hashing help improve the performance of collections?

Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 652
I have read that hashing is used to improve performance of collections such as hashset, hashmap etc.

But why does it improve performance. I read that this technique involves Keeping elements with same hashcode in a hash bucket. But how does this improve performance.

thanks
Saifuddin Merchant
Ranch Hand

Joined: Feb 08, 2009
Posts: 606

I think an example would help here. Lets say you are storing the name of all employee in your organization.
e.g. Alan, Matt, Mark, Sam, Sunil

If you store them normally without hashing, searching for a single employee would mean looking at all 5 names (you could use some searching technique, but still need to look at 5 elements)

Now lets use a very simple hash called "First Letter"

A - Alan
M - Matt, Mark
S - Sam, Sunil

Searching for employee starting with 'S' would mean find the hash bucket and then looking only within that hash bucket which improves performance.

A hash function should
a) Distribute the data-set into a smaller set of hash buckets
-- so a hash function that distributes 100 elements in 1 bucket or in 100 buckets is bad. If it distributes it in 10 buckets is good
b) Each 'bucket' should be uniform in size
-- so in hash function above if one bucket contains 91 elements and other buckets contain 1 element each it is bad. If each bucket contains 10 elements it is good

Note: Hash called "First Letter" is not really very good, since we would have more name starting with some letters like 'R' or 'S' v/s names with say 'X' 'Y' or 'Z'

Cheers - Sam.
Twisters - The new age Java Quiz || My Blog
Mitch Aman
Greenhorn

Joined: Nov 24, 2013
Posts: 9

Think of some other collections you may have used (Array, ArrayList, Linked List, etc.), and how the data was stored; one after another. The way to find the data you were looking for is simply traverse through the structure until you found it, which may take a long time with very large amounts of data.

Hashing the data will assign data a specific address (or bucket). Now to find the data, you can look directly to that address- and there it is. Occasionally, multiple data items will be stored in the same address (known as a collision) and a resolution must be implemented.

A good way to think of the advantage: Trying to find a friend. Using a collection like an Array, you would need to stop at every house in town looking for this friend, until you arrived at his house. However, with Hashing you're given his address. You can find your friend's house directly. A good way to think of the buckets is as apartment complexes. You've got the address, but now you need search each apartment individually.
Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 652
thanks all

A good way to think of the advantage: Trying to find a friend. Using a collection like an Array, you would need to stop at every house in town looking for this friend, until you arrived at his house. However, with Hashing you're given his address. You can find your friend's house directly. A good way to think of the buckets is as apartment complexes. You've got the address, but now you need search each apartment individually.


Does this not mean that a further classification would have further increased the performance. Why only classisification of Apartment,why now State then City then Apartment.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18914
    
  40

Monica. Shiralkar wrote:
Does this not mean that a further classification would have further increased the performance. Why only classisification of Apartment,why now State then City then Apartment.


It is only an analogy used to explain why it is faster in searching for your friend. If you are going to take the analogy literally, it is going to break down.... To continue using the analogy (and completely destroy it), the hashing also controls the world that your friends lives in -- as the population grows the number of houses grows; You can get to each house via teleportation (ie. meaning there is no concept of closer houses and farther houses). And it is, if not done right, is not that accurate -- sometimes, the address given is for a house already occupied, in which case, the people has to share the house.

In this inaccurate condition, yes, perhaps more classification would be better, but remember, the original classification is supposed to work to the point where it is not needed -- no house sharing.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 652
thanks
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: why actually does hashing help improve the performance of collections?