I want to create a table in a database. The table should have a key and a value. The table should have entries around 100million but each row is just a 10 digit key and a 7 digit value corresponding to it. I want to load the database in memory.
Instead of using a database table, I have been asked to create a hashtable from the table in the db, on server start up. And load the hashtable in the memory.
I wanted to just use mysql for it but then I was told that no matter how much i optimize it, each query will still have to go through a query processor.
And becuase my query will be just a look up and nothing more complex so
if I create a hashtable in java and load it in memory that overhead could be avoided.
Do you think having a hastable in memory is a better idea than using a database table in mysql?
Also how can i make sure that the hashtable remains cached at all instances?
Why not look into some of the newer key/value databases that have been appearing lately? They are especially adept at handling large volumes of data.
Some possibilities include HBase (from Hadoop), Cassandra and CouchDB. See http://nosql-database.org/ for lots of options.
Do you actually think having a hastable is better than having a DB?
The hashtable will have 17 digits in a row, that means 17*100 million approx 1.7GB of RAM.
I do have a larger RAM size on my machine.
Why do you say that
If it has to page to disk, you might as well not be caching.
I want to minimize the disk i/o that is why I want the hashtable in cache. When an update happens (which is very infrequent), I update the table on the disk and then cache it.
Also, how do I cache the Properties/Hashtable? By running the program once? Is there any way to make sure that the table remains cached?
On my machine (JVM 64-bit) a hashtable with 1000 entries (keys of 10 chars, values of 7 chars) consumed 144196 bytes of memory,
average 144 bytes per one entry - 100 million x 144 gives approx 14.4GB of RAM On 32-bit system memory consumption per one entry should be a little bit smaller, but you will hit a 2GB memory limit per one JVM.
Joined: Dec 21, 2008
Thank you for the input. I am running my program on a machine with 8 cores and 12GB RAM. When i increase the number of entries in the hashtable to more than 1 million, I get a
java.lang.OutOfMemoryError Exception. I am using <Integers, Integer> in the Hashtable and the size of each row/entry is 58 Bytes. With 5 million entries the size of the hashtable will be 290MB. I increased the heap size as: -Xms300m -Xmx500m but I still get the OutOfMemory Error. What could the problem be and how can I resolve it?
If your program needs more memory than the JVM can provide you need to change your program to do whatever you need it to do differently. A 64bit JVM on a machine with 12 GB of RAM gives you a pretty big chunck of available memory.
If you are not sure what is using memory or how it is using it, profile it to find out.
author & internet detective
I don't think you can control the JVM's or OS's memory management without treachery.
You might be able to use JNI to hold on to a huge chunk of memory... but I guess I'm still not convinced this is really necessary. Without knowing the access pattern I wouldn't want to try and optimize.
subject: Hashtable in memory versus database table in memory