Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes Java in General and the fly likes how to trim the hashmap size? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "how to trim the hashmap size?" Watch "how to trim the hashmap size?" New topic
Author

how to trim the hashmap size?

Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

ArrayList#toTrimSize: this method free the extra space. which is useful sometime.

and in HashMap when ever rehashing happen the size will be doubled. if there are very few element after the rehashing then huge space will be wasted. is there any way to trim it or am i missing something here.

should i go for TreeMap where there is a space constraint?
Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

interestingly, http://www.coderanch.com/t/580255/Performance/java/Overuse-Java-HashMap posted before me at performance forum
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Seetharaman Venkatasamy wrote:
and in HashMap when ever rehashing happen the size will be doubled. if there are very few element after the rehashing then huge space will be wasted. is there any way to trim it or am i missing something here.


If your map doubles in size from 1 million buckets to 2 million buckets, and you only need one of the new buckets, and you have a load factor of 0.75, that means you're wasting about 1.25 million buckets. At 4 bytes per reference, that's about 10 MB wasted. That's not necessarily horrible. In particular, if you're doing something big enough where you have a million-entry map, you've probably allocated enough memory to the JVM that 10 MB is quite a small fraction of it.

Do you, in fact, have million-entry HashMap?

On the other hand, if this happens at the scale of 10,000 elements, then you're wasting 100 kB. Not something to worry about in most cases. And remember, the waste occurs only if you just happen to have to rehash and only need a few more entries right after that.

If it's actually a problem for you (as in, you've measured and it is causing problems), then you could either try to predict ahead of time how big your map will need to be, or, after you've filled it up, copy it to a new map that is appropriately sized to start with, so that there will be no waste. (Note that you'll have to divide the actual number of entries by the load factor in order to prevent a rehash. If you have a load factor of 0.75, and you have 7,500 entries, you'd want your Map's initial capacity to be 10,000.)

should i go for TreeMap where there is a space constraint?


I wouldn't advise a TreeMap to save space on unused buckets. I'd only use a TreeMap if I wanted to keep my keys sorted.
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2853
    
  11

Interesting question. It's true that the capacity of the HashMap doesn't shrink once it's been grown by rehash(), but then it won't need to grow again if you again store a lot of elements in it. It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap. You might be able to subclass HashMap to include a "toTrimSize()"-like method, but since "buckets", the relevant data structure, seems to be package private, that might be tricky.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it.


I think he's talking about when a put() kicks off a rehash, but then you don't put() any more elements (or put very few) after that. So you end up with a load of 0.375 when you could have had 0.7501 or somesuch.

Maybe not quite as rare as the situation you describe, but probably pretty rare, especially with maps large enough to where it would be a problem.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two. At least, the way it was implemented by Sun, and is currently implemented by Oracle. That could change in the future, but don't hold your breath. So it's pretty much impossible to trim to size as the original poster describes, unless you re-implement HashMap itself.

Alternately, you could look at Hashtable, which has a different-enough implementation that my above comments do not apply.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Mike Simmons wrote:If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two.


Oh, right, I forgot about that.

So that negates the suggestion of pre-sizing or copying to a new map after filling.
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2853
    
  11

Jeff Verdegan wrote:
Mike Simmons wrote:If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two.


Oh, right, I forgot about that.

So that negates the suggestion of pre-sizing or copying to a new map after filling.


No. At least the version I'm looking at shows:



That is, it's not always a power of two. Actually, the default initial capacity is 11, but you can change that to whatever you want. The rehash() method doubles the old capacity and adds one. Pre-sizing is definitely an option, as is copying to a new map.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Greg Charles wrote:
No. At least the version I'm looking at shows:



That is, it's not always a power of two.


My 1.6 source has this:


And it doesn't have what you show, although Hashtable does.

So I guess Ht is not always a power of 2 capacity, but HM is.
Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

I have java 1.6 update 18
From HashMap,


length multiply by 2
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Seetharaman Venkatasamy wrote:
length multiply by 2


Right. And the constructor I snipped above ensures that it always starts out as a power of 2. So, with the current version of HashMap, it will always be a power of 2 (but Hashtable does not have that restriction).
Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

Jeff Verdegan wrote: the constructor I snipped above ensures that it always starts out as a power of 2.

bit confused I am. only *initial capacity* is marked as power of 2. and then when ever rehashing occur then *growing rate* is multiply by 2 only. (obviously the number is power of 2 but not growing rate)
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Seetharaman Venkatasamy wrote:
Jeff Verdegan wrote: the constructor I snipped above ensures that it always starts out as a power of 2.

bit confused I am. only *initial capacity* is marked as power of 2. and then when ever rehashing occur then *growing rate* is multiply by 2 only. (obviously the number is power of 2 but not growing rate)


I'm not sure what you're saying. We start with an initial capacity that's always a power of 2. Then, when we resize, we always multiply by 2. Thus, the number of buckets of a HashMap will always be a power of 2.
Seetharaman Venkatasamy
Ranch Hand

Joined: Jan 28, 2008
Posts: 5575

Jeff Verdegan wrote:
We start with an initial capacity that's always a power of 2. Then, when we resize, we always multiply by 2. Thus, the number of buckets of a HashMap will always be a power of 2.

exactly I mean this
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
Which seems to be what Jeff was saying in the first place. So it's not clear why were you disagreeing with him. But you seem to agree now.
Greg Charles
Sheriff

Joined: Oct 01, 2001
Posts: 2853
    
  11

Interesting. I was looking at the first Google hit for HashMap source, which seems to be out of date. I would have thought hash tables would have been well enough understood to get HashMap right in the first implementation, but apparently not!

In any case, HashMap doesn't change what we specify for load factor, and via the load factor we can affect the threshold for growing capacity. If the problem is a lot of wasted capacity because the last element we added caused a doubling of capacity, and we don't plan to add any more, then we could fix that by upping the load factor in the constructor, or copying into a new HashMap with a higher load factor once we were done adding elements. I can't see that being practical though.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
Greg Charles wrote:Interesting. I was looking at the first Google hit for HashMap source, which seems to be out of date. I would have thought hash tables would have been well enough understood to get HashMap right in the first implementation, but apparently not!

As I recall, they found that enough classes had potentially problematic hashCode() methods, that it became advantageous to add their own provate hash(int) method inside HashMap to convert a given hash to another, better-distributed hash. Once they did this, it was no longer necessary to avoid power-of-two hashtable sizes, and so they switched to that in order to take advantage of slightly faster division using bit shifts. Or something like that.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39084
    
  23
A lot of people will not understand faster division with bit shifts. Maybe a bit more explanation would be required.
I haven’t checked the source for a long time, but it used to use hash & arraySize - 1 or similar, which allows you to replace the slow division operation with fast addition (you can use addition and a two’s complement transformation together to perform subtraction) and very fast bitwise and. That, of course, is dependent on the array size being 16, 32, 64 or another power of 2. If the initial capacity is 11 that probably means a default array size of 16 and load of 75%; when you reach 12 elements it would increase the size to 32. You would have to check carefully in the source, whether the array is enlarged at content = 12 or content = 13.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39084
    
  23
Moving thread as too difficult for “beginning”
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap.

Actually, new MashMap<...>(bloatedMap) would work as good. The copy constructor does not inherit the old capacity from the map; it even does not care whether the given map is a HashMap or another implementation.
Jeff Verdegan
Bartender

Joined: Jan 03, 2004
Posts: 6109
    
    6

Martin Vajsar wrote:
Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap.

Actually, new MashMap<...>(bloatedMap) would work as good. The copy constructor does not inherit the old capacity from the map;


But the size will still be a power of 2, so it will have as many buckets as the original map. About the only thing you could do is


In this case, if originalMapSize / originalMapLoadFactor > some power of 2 but originalMapSize / 1.0 < that power of 2 then we'll end up with a map with half as many buckets as the original.

Overall, though, I'd say this is something not to worry about in the general case, and if it arises as a problem in a particular case, then we need to take a holistic look at that particular case, its requirements, use cases, the situation(s) in which this is happening, etc. Maybe we change our whole approach and don't put all this stuff in a map at once. Maybe we fiddle with the load factor a priori. Maybe we use a different kind of Map. Maybe we just give the JVM more memory.



Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3610
    
  60

Greg's post concerned the case when you've removed significant amount of items from a map and want to reclaim unused space, and what I wrote on that holds. It was a minor thread in this broad discussion, though.

Your remarks on the load factor are true, of course.
 
GeeCON Prague 2014
 
subject: how to trim the hashmap size?