ArrayList#toTrimSize: this method free the extra space. which is useful sometime.

and in HashMap when ever rehashing happen the size will be doubled. if there are very few element after the rehashing then huge space will be wasted. is there any way to trim it or am i missing something here.

should i go for TreeMap where there is a space constraint?

Seetharaman Venkatasamy wrote:
and in HashMap when ever rehashing happen the size will be doubled. if there are very few element after the rehashing then huge space will be wasted. is there any way to trim it or am i missing something here.

If your map doubles in size from 1 million buckets to 2 million buckets, and you only need one of the new buckets, and you have a load factor of 0.75, that means you're wasting about 1.25 million buckets. At 4 bytes per reference, that's about 10 MB wasted. That's not necessarily horrible. In particular, if you're doing something big enough where you have a million-entry map, you've probably allocated enough memory to the JVM that 10 MB is quite a small fraction of it.

On the other hand, if this happens at the scale of 10,000 elements, then you're wasting 100 kB. Not something to worry about in most cases. And remember, the waste occurs only if you just happen to have to rehash and only need a few more entries right after that.

If it's actually a problem for you (as in, you've measured and it is causing problems), then you could either try to predict ahead of time how big your map will need to be, or, after you've filled it up, copy it to a new map that is appropriately sized to start with, so that there will be no waste. (Note that you'll have to divide the actual number of entries by the load factor in order to prevent a rehash. If you have a load factor of 0.75, and you have 7,500 entries, you'd want your Map's initial capacity to be 10,000.)

should i go for TreeMap where there is a space constraint?

I wouldn't advise a TreeMap to save space on unused buckets. I'd only use a TreeMap if I wanted to keep my keys sorted.

Interesting question. It's true that the capacity of the HashMap doesn't shrink once it's been grown by rehash(), but then it won't need to grow again if you again store a lot of elements in it. It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap. You might be able to subclass HashMap to include a "toTrimSize()"-like method, but since "buckets", the relevant data structure, seems to be package private, that might be tricky.

Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it.

I think he's talking about when a put() kicks off a rehash, but then you don't put() any more elements (or put very few) after that. So you end up with a load of 0.375 when you could have had 0.7501 or somesuch.

Maybe not quite as rare as the situation you describe, but probably pretty rare, especially with maps large enough to where it would be a problem.

If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two. At least, the way it was implemented by Sun, and is currently implemented by Oracle. That could change in the future, but don't hold your breath. So it's pretty much impossible to trim to size as the original poster describes, unless you re-implement HashMap itself.

Alternately, you could look at Hashtable, which has a different-enough implementation that my above comments do not apply.

Mike Simmons wrote:If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two.

Oh, right, I forgot about that.

So that negates the suggestion of pre-sizing or copying to a new map after filling.

Mike Simmons wrote:If you're using a java.util.HashMap, check out the source code. I believe you will find that the size of the bucket table will always be a power of two.

Oh, right, I forgot about that.

So that negates the suggestion of pre-sizing or copying to a new map after filling.

No. At least the version I'm looking at shows:

That is, it's not always a power of two. Actually, the default initial capacity is 11, but you can change that to whatever you want. The rehash() method doubles the old capacity and adds one. Pre-sizing is definitely an option, as is copying to a new map.

Seetharaman Venkatasamy wrote:
length multiply by 2

Right. And the constructor I snipped above ensures that it always starts out as a power of 2. So, with the current version of HashMap, it will always be a power of 2 (but Hashtable does not have that restriction).

Jeff Verdegan wrote: the constructor I snipped above ensures that it always starts out as a power of 2.

bit confused I am. only *initial capacity* is marked as power of 2. and then when ever rehashing occur then *growing rate* is multiply by 2 only. (obviously the number is power of 2 but not growing rate)

Jeff Verdegan wrote: the constructor I snipped above ensures that it always starts out as a power of 2.

bit confused I am. only *initial capacity* is marked as power of 2. and then when ever rehashing occur then *growing rate* is multiply by 2 only. (obviously the number is power of 2 but not growing rate)

I'm not sure what you're saying. We start with an initial capacity that's always a power of 2. Then, when we resize, we always multiply by 2. Thus, the number of buckets of a HashMap will always be a power of 2.

Jeff Verdegan wrote:
We start with an initial capacity that's always a power of 2. Then, when we resize, we always multiply by 2. Thus, the number of buckets of a HashMap will always be a power of 2.

exactly I mean this

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

1

Which seems to be what Jeff was saying in the first place. So it's not clear why were you disagreeing with him. But you seem to agree now.

Interesting. I was looking at the first Google hit for HashMap source, which seems to be out of date. I would have thought hash tables would have been well enough understood to get HashMap right in the first implementation, but apparently not!

In any case, HashMap doesn't change what we specify for load factor, and via the load factor we can affect the threshold for growing capacity. If the problem is a lot of wasted capacity because the last element we added caused a doubling of capacity, and we don't plan to add any more, then we could fix that by upping the load factor in the constructor, or copying into a new HashMap with a higher load factor once we were done adding elements. I can't see that being practical though.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

Greg Charles wrote:Interesting. I was looking at the first Google hit for HashMap source, which seems to be out of date. I would have thought hash tables would have been well enough understood to get HashMap right in the first implementation, but apparently not!

As I recall, they found that enough classes had potentially problematic hashCode() methods, that it became advantageous to add their own provate hash(int) method inside HashMap to convert a given hash to another, better-distributed hash. Once they did this, it was no longer necessary to avoid power-of-two hashtable sizes, and so they switched to that in order to take advantage of slightly faster division using bit shifts. Or something like that.

A lot of people will not understand faster division with bit shifts. Maybe a bit more explanation would be required.
I haven’t checked the source for a long time, but it used to use hash & arraySize - 1 or similar, which allows you to replace the slow division operation with fast addition (you can use addition and a two’s complement transformation together to perform subtraction) and very fast bitwise and. That, of course, is dependent on the array size being 16, 32, 64 or another power of 2. If the initial capacity is 11 that probably means a default array size of 16 and load of 75%; when you reach 12 elements it would increase the size to 32. You would have to check carefully in the source, whether the array is enlarged at content = 12 or content = 13.

Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap.

Actually, new MashMap<...>(bloatedMap) would work as good. The copy constructor does not inherit the old capacity from the map; it even does not care whether the given map is a HashMap or another implementation.

Greg Charles wrote:It's a rare situation where you need to retain the HashMap after removing a significant number of elements in it. If that's what you really need to do though, you could iterate through it and add the elements to a pristine, unbloated HashMap.

Actually, new MashMap<...>(bloatedMap) would work as good. The copy constructor does not inherit the old capacity from the map;

But the size will still be a power of 2, so it will have as many buckets as the original map. About the only thing you could do is

In this case, if originalMapSize / originalMapLoadFactor > some power of 2 but originalMapSize / 1.0 < that power of 2 then we'll end up with a map with half as many buckets as the original.

Overall, though, I'd say this is something not to worry about in the general case, and if it arises as a problem in a particular case, then we need to take a holistic look at that particular case, its requirements, use cases, the situation(s) in which this is happening, etc. Maybe we change our whole approach and don't put all this stuff in a map at once. Maybe we fiddle with the load factor a priori. Maybe we use a different kind of Map. Maybe we just give the JVM more memory.

Greg's post concerned the case when you've removed significant amount of items from a map and want to reclaim unused space, and what I wrote on that holds. It was a minor thread in this broad discussion, though.

Your remarks on the load factor are true, of course.

I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com