Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes HashTable : initial capacity and load factor Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "HashTable : initial capacity and load factor" Watch "HashTable : initial capacity and load factor" New topic
Author

HashTable : initial capacity and load factor

jose chiramal
Ranch Hand

Joined: Feb 12, 2010
Posts: 266
Can someone please explain these two concepts :

initial capacity and load factor

Also please point me to a good example which is using these concetps.

I got this statement from the API doc :

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

What does a bucket mean here ??
Sebastian Janisch
Ranch Hand

Joined: Feb 23, 2009
Posts: 1183
Ok, let me try ...

A HashTable is made of a 2-dimensional array that has an initial capacity (e.g. 20).

So, when you add an object to the HashTable, the index in the array is calculated (we'll see how later) and the object is stored at this index. If two different objects get assigned the same value, they will end up in the same array index.

How does all that work. Objects have a hashCode method, which returns an int. At it's best, the hashCode is different for all objects that are meaningfully different from each other.

Whe calculating the array index from the hashCode, something like this happens.



So, if the capacity of the HashTable is 20, the objects hashCode will be calculated modulo 20, which always returns a value between 0 and 19 --> The array index. You see that the greater the capacity is, the better distributed the values will get. At it's best every object has it's own index (or bucket). This is important because upon object retrieval the HashTable calculates the index, looks it up and finds 0 to n objects placed in that bucket. Now it has to iterate over all objects found and use it's equals method to determine the object we look for.

That is why it's important to have a good distribution among the indexes, because in the worst case (e.g. index is always the same) all objects will land in the same bucket and whenever you want to retrieve an object the HashTable has to iterate over all objects (which is pretty much what LinkedList does).

The load factor is the amount of load that has to be reached until the capacity is increased to ensure a better distribution.

Let's say the load factor is 0.75 and we have an initial capacity of 20. As soon as the 15th element is added, the capacity is increased (in general doubled) and the elements get redistributed. Now that we have 40 instead of 20 buckets available, the elements get distributed better and the object retrieval is fast as a beam of light ;-) ..

Hope that makes it clear .



JDBCSupport - An easy to use, light-weight JDBC framework -
Tomasz Kaczmarzyk
Greenhorn

Joined: Mar 05, 2010
Posts: 3

I have related question about hashCode() % capacity concept.

In many scjp mocks and in Kathy Sierra & Bert Bates book there are questions like this:

class Test {
int i;
Test(int i) {this.i = i;}
public boolean equals(Object o) {return ((Test)o).i == this.i}
}

HashSet s = new HashSet();
s.add(new Test(1));
s.add(new Test(1));

System.out.println(s.size()); // what will be the output?



Now the problem: authors say that 2 is correct answer -- "because hashCode() is not overriden, so two object will go to different buckets."

BUT! I think that even when each hashCode is unique, there is posiibility that hash1%capacity == hash2%capacity AND then the output is 1.

What do You think about it?
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: HashTable : initial capacity and load factor
 
Similar Threads
HashMap - load factor ?
Hashtable default capacity ???
what is load factor of hashset ???
load factor in collections
HashTable - performance issue for millions of values stored