• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Off heap hashmap backed by memory mapped file

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Forgive me if this is not the right forum for this post - please direct me to appropriate forum and I'll post there.

We have a need for an off heap HashMap<String, Object> that can potentially store an unlimited number of entries. The value Object to be stored is variable in size. The HashMap is needed only for the duration of processing an input file. It does not need to be persisted.

We can make our current design (a tree based data structure) work with a 64 bit JVM, a lot of physical memory, and heap at least 8G, but many of the target environments don't have these characteristics. For some test cases we're still getting killed by garbage collection.

I think a hashmap backed by a memory mapped file would possibly be a good solution but haven't found one that works yet. As an added point of information, we're looking for a relatively lightweight solution - just adding a few jar files to the application rather than depending on something as complex as Hadoop or Apache Ignite.

Here's what we've looked at, and why each hasn't worked:

EhCache - fine, up to a point. Very large input requires a TerraCotta license and there are issues around this (redistribution)

MapDb - Version 1.0.x was mostly OK, but for the more extreme test cases we were running it still sucked up most of the machine's memory (memory leaks I assume) and was not acceptable. Version 2 had performance degradation, and V3 isn't ready yet.

openHFT ChronicleMap (and predecessors) - limited to approximately 4GB on Windows, and must be sized before creating (i.e., not unlimited)

clapper.org FileHashMap - Very low memory profile, unlimited size, but slow. Also got killed by garbage collection with large test cases

JCS - Would not support large test case. Also, is designed more as a caching solution than what we need - there's a chance that older data will be pushed out of the hashmap. Not acceptable.

OrientDb - very slow in initial testing

Does anyone have any suggestions for an open source solution for this problem? We really don't want to write our own implementation, but if we can't find something suitable we may have to go that route.

Thanks!
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

Off heap? Is that possible? That sounds as though you want persistence, and to me persistence off heap means a database. Try something like MySQL which is open source.
There are bound to be other suggestions.
 
Mike Rawlins
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Campbell - Thanks for the reply. As I said, no, we don't want persistence. We want off heap to avoid the overhead of garbage collection.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is wrong with garbage collection? It is usually fast enough that you don't notice it.
 
Mike Rawlins
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"java.lang.OutOfMemoryError: GC overhead limit exceeded" is what is wrong with it. In several of our test cases we get this, even with a good heap size. Unfortunately in many of our target hardware/OS environments there are constraints on how much heap can be requested for the JVM.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
…And what is the largest likely size of the Map? As you doubtless already know, many Maps are backed by an array containing Map.Entry objects. I don't think it is a Map.Entry<K, V>[] because you are not supposed to be able to create arrays of types with <> in their names, so it might be an Object[] into which Map.Entry<K, V> instances are put. You can verify that by inspecting the code for java.util.HashMap.
Now, the code uses a formula similar to myArray[h & c − 1] where h is the hash code (after some bit‑fiddling) and c the capacity/length of the array. That formula works beautifully as long as c is an exact power of 2; since you use ints for array sizes, the largest possible array has a capacity of 2³°. If you overwhelm the load factor and capacity of that Map, (or if you have multiple Entries with the same hash code), your Map.Entry<K, V> objects turn themselves into a linked list, and as you know linked lists are in theory unlimited in size. But when you work out the size of that Map, you see are going to overwhelm the RAM capacity of most computers long before your linked lists get any size.
Yes, I know you can buy 16GB of spare RAM for much less than it costs our family for dinner out to celebrate my birthday. The JVM defaults to using 25% of available RAM as (maximum) heap space. You can probably easily pass an option to the JVM which increases that maximum to 75%, especially if you have much RAM to play with, and that sounds simpler to me than trying to persuade the JVM to work outwith its heap. Also, I suspect, you will get much slower performance if you have to go outwith your heap.

Anybody else got any different suggestions?
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Rawlins wrote:"java.lang.OutOfMemoryError: GC overhead limit exceeded" . . .

Don't know any more about that, I am afraid.

Sorry.
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Have you evaluated Redis? I feel it meets some of your requirements (but not all).

Meets:
- Off JVM heap and no JVM garbage collection. That's because Redis runs in its own process outside the JVM. It's not a Java process.

- Uses swap memory. But only for values, not keys. So your data structure has to have all its keys fit in available RAM.
Also, Redis itself deprecated virtual memory some years back (since version 2.4 I think). So you'd have to use one of the older versions.

- Tunable eviction policy. If you don't want any evictions, that's possible. Only subject to previous point's constraints.

- It's much more lightweight than any Java solution. Written in C.

- Can be used by JVM application via Jedis or JRedis or any other client. Jedis is most popular.

Does not meet:

- Does not run on Windows officially AFAIK. You'd have to use a 3rd party port or build from source yourself

- Can't store java objects as is. You'd have to serialize and deserialize objects to byte arrays and store them.
 
Mike Rawlins
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks, Karthik. No, we haven't looked at Redis yet. We were hoping to find something that would run entirely within the same JVM. We'll look at Redis if we can get past that requirement.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic