This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hashtable in memory versus database table in memory

 
mayank gupta
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to create a table in a database. The table should have a key and a value. The table should have entries around 100million but each row is just a 10 digit key and a 7 digit value corresponding to it. I want to load the database in memory.

Instead of using a database table, I have been asked to create a hashtable from the table in the db, on server start up. And load the hashtable in the memory.
I wanted to just use mysql for it but then I was told that no matter how much i optimize it, each query will still have to go through a query processor.
And becuase my query will be just a look up and nothing more complex so
if I create a hashtable in java and load it in memory that overhead could be avoided.


Do you think having a hastable in memory is a better idea than using a database table in mysql?
Also how can i make sure that the hashtable remains cached at all instances?
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34178
340
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use a property file on disk. Then you can use Properties to read it in. Since Properties is a hashtable, you have an optimized approach to read it in.

You've confirmed you have enough memory to store this giant hashtable, right? If it has to page to disk, you might as well not be caching.
 
Peter Johnson
author
Bartender
Posts: 5852
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why not look into some of the newer key/value databases that have been appearing lately? They are especially adept at handling large volumes of data.
Some possibilities include HBase (from Hadoop), Cassandra and CouchDB. See http://nosql-database.org/ for lots of options.
 
mayank gupta
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you actually think having a hastable is better than having a DB?
The hashtable will have 17 digits in a row, that means 17*100 million approx 1.7GB of RAM.
I do have a larger RAM size on my machine.

Why do you say that
If it has to page to disk, you might as well not be caching.

I want to minimize the disk i/o that is why I want the hashtable in cache. When an update happens (which is very infrequent), I update the table on the disk and then cache it.

Also, how do I cache the Properties/Hashtable? By running the program once? Is there any way to make sure that the table remains cached?

I will also look at the nosql databases.
 
Ireneusz Kordal
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mayank gupta wrote:
The hashtable will have 17 digits in a row, that means 17*100 million approx 1.7GB of RAM.
I do have a larger RAM size on my machine.

Read this article first: http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html?page=1
download an utility (see resources on last page) and do a simple test:

On my machine (JVM 64-bit) a hashtable with 1000 entries (keys of 10 chars, values of 7 chars) consumed 144196 bytes of memory,
average 144 bytes per one entry - 100 million x 144 gives approx 14.4GB of RAM
On 32-bit system memory consumption per one entry should be a little bit smaller, but you will hit a 2GB memory limit per one JVM.
 
mayank gupta
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Thank you for the input. I am running my program on a machine with 8 cores and 12GB RAM. When i increase the number of entries in the hashtable to more than 1 million, I get a
java.lang.OutOfMemoryError Exception. I am using <Integers, Integer> in the Hashtable and the size of each row/entry is 58 Bytes. With 5 million entries the size of the hashtable will be 290MB. I increased the heap size as: -Xms300m -Xmx500m but I still get the OutOfMemory Error. What could the problem be and how can I resolve it?
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Increase the heap size. If you want to find out what is using the memory you will need to profile your application.
 
mayank gupta
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No matter how much I increase the heap size, I keep getting the error.
What is the significance of the minimum heap size? How much should that be?
 
mayank gupta
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can anybody please suggest? I am not able to run my code beyond 3 million.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If your program needs more memory than the JVM can provide you need to change your program to do whatever you need it to do differently. A 64bit JVM on a machine with 12 GB of RAM gives you a pretty big chunck of available memory.

If you are not sure what is using memory or how it is using it, profile it to find out.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34178
340
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mayank gupta wrote:Why do you say that
If it has to page to disk, you might as well not be caching.

I want to minimize the disk i/o that is why I want the hashtable in cache. When an update happens (which is very infrequent), I update the table on the disk and then cache it.

Right. But if the cache doesn't fit in memory, you are paging to disk and then it doesn't minimize disk i/o.

mayank gupta wrote:Also, how do I cache the Properties/Hashtable? By running the program once? Is there any way to make sure that the table remains cached?

You cache something by reading it into memory. Presumably you have code to do that if you have a cache. I don't know of a way to make sure that segment of memory remains cached in RAM.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think you can control the JVM's or OS's memory management without treachery.

You might be able to use JNI to hold on to a huge chunk of memory... but I guess I'm still not convinced this is really necessary. Without knowing the access pattern I wouldn't want to try and optimize.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic