Last week, we had the author of TDD for a Shopping Website LiveProject. Friday at 11am Ranch time, Steven Solomon will be hosting a live TDD session just for us. See for the agenda and registration link
I am trying to clear my concepts about Strings. Following is the code which i tried
Why is this happening. According to my understanding if the Strings are similar then we have only one instance of that String in memory. So as s,s1,s2,s3 are same. So according to me s,s1,s2,s3 would be referring to the same address location in memory. They are displaying the same hashCode() so that's OK. But why are they giving false when I do (s == s1), (s == s2), (s == s3) . My concepts are shattered. Please help in clearing my concepts. I expected the 3 statements System.out.println(s == s1); System.out.println(s == s2); System.out.println(s == s3); to return true. Why are they returning false. I realize that == is for comparing references and if they are same then it returns true else it returns false. So as the String constant is same (i.e. Hello World), the references should point to the same memory location. So it should print true....
HashCode is a method used in computer science to give an arbitrary numeric that always returns the same number for a String. How the String is stored in memory of the computer will affect how ( == ) operates. String class equals methods will compare two strings and decide if they are equal no matter where they are in memory.
The hash number is calculated based on the contents of the String. You can see an overview of how it is calculated in the Sun JVM by looking at the API documentation for that method.
So knowing that the hash code is always calculated based on the actual Characters in the String, it makes sense that the hash code is always going to be the same for Strings that contain identical Characters.
However there is nothing to state that a hashing function must return a different value for different contents. It is quite plausible for two different contents to return the same hash code. For example, on my computer, I get the same results for "aa" and "bB": So you can see that no matter whether I constructed the Strings or whether I just used them, I still got the same hash code for 2 different strings. This is as per the contract for hash codes.
Hash codes therefore cannot be used to determine equality (actually I suppose it is feasible that you could write a class that uses an algorithm that produces unique values for distinct objects, however this is likely to defeat the purpose of hashing - namely to quickly get a known value for an object that will enable you to lookup that object in a collection (such as a HashMap).
But, to make this a bit simpler (at the risk of making a slightly inaccurate picture). Imagine if we had 10 memory locations:
Now lets try adding some code that allocates some constants: At the end of that, only one memory location has been used:
And all three locations point to it (so s1, s2, and s3 are all pointing to the constant in constant memory pool location # 7.
Whereas when we use "new" to generate a new Object, they will be allocated on the heap: Now we have some heap space being used:
Now as to how this affects you:
The hashCode for s1, s2, s3, s4, and s5 is always computed for the actual values of the Characters in the String "Hello, World". So in all 5 cases the hashCode will be identical.
However, as previously stated, this is useless to you for determining equality - it is quite plausible that another string could have an identical hash code.
By default, the "==" operator compares the memory locations of two objects. So in my examples s1, s2, and s3 are all pointing to the same memory location in the Constant Memory Pool. So the "==" operator will show that they are the same object.
However s4 and s5 are each pointing to different objects (memory locations 1 and 3 on the stack), so the "==" operator will show them as being different objects.
The String .equals() method will look at the individual characters in the String and determine whether each Character is equal.
It will indicate that the contents of s1, s2, s3, s4, and s5 are all equivalent, even though they are in different memory locations. So it all depends on what you want to determine.
If you want a fast way to look something up in a collection then you want to use the hash function (actually the collection classes will use the hashing function themselves - you don't really need to worry about it unless you are implementing your own classes that need to be stored in a collection).
If you need to know whether two Strings contain the same characters, then you need to use the .equals() method.
If you need to know whether two Strings are actually the same object using the same memory location, then you can use the == operator.
Great explanation Nicholas and Andrew. Thanks a lot. It really helped me. One request, do we have any article on net or book that explains this in more detail. Your explanation was great Andrew. Thanks again.
Whoever got anywhere by being normal? Just ask this exceptional tiny ad:
free, earth-friendly heat - a kickstarter for putting coin in your pocket while saving the earth