my dog learned polymorphism*
The moose likes Beginning Java and the fly likes hash code Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "hash code" Watch "hash code" New topic
Author

hash code

mrudul joshi
Ranch Hand

Joined: Nov 12, 2003
Posts: 54
thanks for previous replies.
why are hashcodes and hashtables needed?why do we need to represent an object in its hash equivalent?
Elouise Kivineva
Ranch Hand

Joined: Feb 07, 2002
Posts: 154
Hashcode are created to create unique keys to identify objects for efficient storage.
More information here:
http://www.cs.berkeley.edu/~jrs/61b/lec/22
(first several paragraphs)
and here
http://www.ncrg.aston.ac.uk/~nabneyit/courses/DSDJ/slides/l1516.pdf
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
It's hard to know in these discussions who you're talking to ... I apologize ahead of time if this is talking down to anyone who doesn't deserve it ...
Those were pretty nice articles. I liked a couple words they used. One was "dictionary", common in Smalltalk. It's a nice concept - given a word, return the definition. The other article used "map" which is much less obvious but common in Java. Java even has an interface called Map. It's roughly the same thing - given a key return a value.
Storing something by key is very useful, and using the Hash of the key is one efficient way to do it. A far slower way would be to store the keys in one array and the values in another so that key[x] goes with value[x]. You can probably think of other ways to store key and value. The nice thing about interfaces is that you're allowed to not care how it's done. Learn to love Map and not worry too much whether the one you're using has hashes or linked lists or trees or whatever inside.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Dan Walin
Ranch Hand

Joined: Nov 11, 2003
Posts: 109
I'm trying to understand a little more about Hash Tables. I understand the concept that it's quicker to access data (like in a dictionary) using the hashcode (like an index pointing to the data) but how is that implemented in real life? For example, you have a database (DB2, etc), I assume, that contains all this data. To implement a Hash Table you must read every record in the database and "hash" it into a Hash Table? At that point the entire table sits in memory waiting for use? For an extremely huge table wouldn't it be quicker or less costly to the system to just access the database records? Or is a hash table something much different and I don't understand how it can be implemented?
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Yup, you have a good grip on this. HashMap and such are good in-memory techniques for relatively small amounts of stuff.
You might bring a small database table (say state postal codes & names) into memory this way to cache for high-speed access, but it's always a trade-off to use a gob of memory. You might even decide a few thousand rows are worth having in memory.
Or you might have just a "working set" of rows, like the accounts belonging to the customer currently on screen. Load them up when you start working on this customer, discard them when done.
And you can use Maps for things that are never in the database, like the set of all currently open windows or a map of incoming request keys to classes that handle those requests.
Any of those scenarios help?
John Smith
Ranch Hand

Joined: Oct 08, 2001
Posts: 2937
Or is a hash table something much different and I don't understand how it can be implemented?
A hash-based collection guarantees a O(1) (constant time) performance for retreiving the data. What it means is that if you have 100 data items in a container, it will take the same time to find a particular item as it would be if your container had 1000000 data items in it. Compare that with a simple array, where you would need to iterate through every single item to find your match, -- the more items you store, the longer it will take to find a match.
The way a hash-based collection is implemented is pretty simple. Before an item is stored, we calculate its key, a sort of a signature that identifies it. For example, the key might be a combination of the object data fields. Then we store the data item in a particular "bucket", which has a range of hashcodes, say from 1000 to 1100. When you retreive an item, its hashcode is calculated and the container knows in which bucket it is stored. Subsequently, all the keys in that bucket are evaluated using the equals() method, and when it returns true for the item being searched, the value is returned.
Dan Walin
Ranch Hand

Joined: Nov 11, 2003
Posts: 109
good answers - thanks. A good example was code that I had to do last year: Read file1 which is a list of text strings and associated "document type", a file that can contain 1000 records, and then read each record from file2 - list of documents and their descriptions. I had an array that looped a maximum of 1000 times until a match was found (string from file2 matched file1 string) to determine the "document type". I can see a Hash Table making this program run incredibly fast. Thanks again.
Rocky Summers
Ranch Hand

Joined: Nov 07, 2003
Posts: 66
would there be such a thing as a perfect key function? i just couldn't imagine a function that would create a unique key for everything so we dont have to do chaining and everything. moding it with a table size would even make it less likely for the key to be unique, even if the size is a prime number... does anyone know how to create a perfect hash function? sorry to bring this up again.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
I'd bet that's an area that's been researched pretty well, and I'd guess somebody in designing Java made a concious trade-off between possible collisions and speed of hashing. Try a google on hashcode algorithms and see who's been working on it. Intuitively, we're taking a lot of information - possibly multiple parts to a key - and reducing it to a number. We lose a lot of information, so it's probably not possible to prevent two objects from losing their distinctive information and giving the same hashcode.
 
jQuery in Action, 2nd edition
 
subject: hash code