This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.

This might sound as an very vague question upfront but it is not. I have gone through Hash Function description on wiki but it is not very helpful to understand.

I am looking simple answers for rather complex topics like Hashing, Here are my questions:

What do we mean by Hashing, how does it work internally [Simplified but complete Explanation] ?
What algorithm does it follow ?
What is the difference between : HashMap, HashTable and HashList ?
What do we mean by Constant Time Complexity and why does different implementation of Hash gives constant time operation ?
Lastly, why in most interview questions Hash and LinkedList are asked, is there any specific logic for it from testing interviewee's knowledge ?

I know my question list is big but would really appreciate if I can get some clear answers to this questions as I really want to understand Hash Topic. This has been eating me up from inside.

Links to simple explanation for the above would be greatly appreciated.

If you go to your JDK installation folder, there is a file called src.zip. This contains the source code of most of the Java API classes.

Sandeep Dharwar wrote: What do we mean by Hashing, how does it work internally [Simplified but complete Explanation] ?

Hash tables have a number of buckets to store their data. This way the hash table doesn't need to go through the entire data set to find a value. Instead, the hash code of the key determines in which bucket the entry goes. When looking up the value for a key, the hash table will only look in the bucket for that key. Similarly, when setting the value for a key, the hash table will put the value in the bucket for that key. This is how hash tables in general work. HashMap and Hashtable are not different.
See http://en.wikipedia.org/wiki/Hash_table for more information.

What is the difference between : HashMap, HashTable and HashList ?

I don't know HashList, but HashMap and Hashtable are almost identical. The main differences are:
- Hashtable is synchronized, HashMap is not.
- HashMap allows null keys and values, Hashtable does not.
- Hashtable extends java.util.Dictionary and therefore has extra methods, and also several methods that do the same thing.

What do we mean by Constant Time Complexity and why does different implementation of Hash gives constant time operation ?

See http://en.wikipedia.org/wiki/Time_complexity.
In short, constant time complexity means that it the difference in time between retrieving the value of a key in a hash table with 10 entries and retrieving the value of a key in a hash table with 10,000,000 entries is virtually absent.
However, only ideal hash tables really have constant time complexity. In reality, multiple entries will be stored in the same bucket. That means that the bucket still must be searched, and this is usually a linear operation (meaning that if you get 100 times as many entries, the search will take 100 times as long). The worse the hash implementation, the more entries there will be in each bucket, and the longer lookups will take. An ideal hash table has only buckets of size 1.

Thanks Rob that was really helpful.. I was looking into the source code for HashMap and was having trouble understanding it. I am not able to piece the whole thing togther.

Should I get an understanding of the basic concepts of Hashing in general to understand the internal working to Hashmap?

At this point I have the basic idea of how hashing works.

It would help to understand the basic principle of hash tables before looking at any specific implementation.

But to help you out a bit: in HashMap, buckets are implemented as classes of an inner class called Entry. This is a linked list of all entries in the same bucket. HashMap keeps one Entry[] that represents all buckets. Most operations operate on that array and its entries. There is an internal hash function that maps the keys' hashCode() return values to an index in this array, because hashCode() can of course return values that are outside of the range of the array.

Rob Spoor wrote:But to help you out a bit: in HashMap, buckets are implemented as classes of an inner class called Entry. This is a linked list of all entries in the same bucket. HashMap keeps one Entry[] that represents all buckets.

@Sandeep: It's perhaps worth mentioning that, although a hashed collection's buckets contain a linked list, they are usually initialized so that the vast majority of buckets will contain only one element. This means that, in most cases, finding a bucket (which is extremely fast) is the same as finding its element, and why it's very important that, as far as possible, distinct objects return unique hashcodes.

It's almost impossible to guarantee; but it is the main distinguishing feature between a "good" hashCode() function and a "bad" one.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here