File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes how to reduce runtime size of HashMap Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "how to reduce runtime size of HashMap" Watch "how to reduce runtime size of HashMap" New topic
Author

how to reduce runtime size of HashMap

Madhan B Babu
Greenhorn

Joined: Nov 16, 2010
Posts: 5
Hi

I have a use case where i am pushing tens of thousands of data into a HashMap.

the structure of the HashMap is like
HashMap<String1, Map<String2, Map<String3, String4>>>

Here the String1 would anyway be unique, but the String3 and String 4 are very frequently repeating Strings.
These are representing the Status and Priority values in each "calendar week" (String 2)

The String3 (Status) and String4 (Priority) values can only be withing a predefined set of ten to 15 strings.

Now when i use the below code for serializing the hash map,
----------------------------------------------------------------------------------------------------------------
File file=new File("C:\\testData1.ser");
FileOutputStream fos=new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(new DeflaterOutputStream(fos));
oos.writeObject(MasterChartingData);
oos.flush();
oos.close();
fos.close();

----------------------------------------------------------------------------------------------------------------
and the below code for deserializing,
----------------------------------------------------------------------------------------------------------------
File file=new File("C:\\testData1.ser");
FileInputStream fis=new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(new InflaterInputStream(fis));
HashMap DeserializedMasterChartingData=(HashMap<String, Map<String,Map<String,String>>>)ois.readObject();
ois.close();
fis.close();

----------------------------------------------------------------------------------------------------------------

The repetition of the strings are identified and removed and the storage file size reduces to ~600 KB, where as a normal serialization, without using deflator/inflator, would create a file of ~6MB.

But the problem here is, when the number of entries increases, the Java run time is not able to handle the growing size of the HashMap during runtime.
Is there any effecient ways where, right at the time of constructing the hashmap itself, to identify the data repetition and avoid it, and the HashMap construction is done memory effeciently?

regards
mad
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
After you obtain a String3 or String4 value from... wherever you get them from, try calling intern() on the string, and use the value returned by intern() instead of the original value:

Or equivalently:

Don't do this for any string that you expect may have many many different values, as it could cause problems - it's hard to garbage collect these strings once they're interned. But for strings that you know will be confined to a small finite set of values that you can afford to keep in memory for the life of the program, it's fine, and should accomplish exactly what you need.

You could also encode the String3 and String4 data in various other ways. Perhaps each could be represented by an enum, and you use the enum's valueOf() method to look up the enum value for a given string. But I expect that will give you almost exactly the same memory usage as using intern() will.

There may be more compact ways to encode the hashmaps, especially the last one. But I doubt the saving will be worth the complexity. The intern() method will save you much more memory than any subsequent encoding tricks. Probably.
Deepak Bala
Bartender

Joined: Feb 24, 2006
Posts: 6657
    
    5

Madan, please avoid creating duplicate posts. You can find my reply to your query here -> http://www.coderanch.com/forums/posts/list/531720#2411074

Its pretty much what Mike suggested in the latter part of his post.


SCJP 6 articles - SCJP 5/6 mock exams - More SCJP Mocks
Madhan B Babu
Greenhorn

Joined: Nov 16, 2010
Posts: 5
Hi

HashMap<String[1], Map<String[2], Map<String[3], String[4]>>

It is to keep track of the Status and Priority values associated with an object represented by String[1] on each day.
Hence the String[1] is unique and the other strings are repeating values.

String[2] will have values related to the day of the year
String[3] will have values either "Status" or "Priority"
String[4] will have any one values from {Major, Critical, Minor} or {Open, Resolved, Closed}

regards
mad
Madhan B Babu
Greenhorn

Joined: Nov 16, 2010
Posts: 5
Hi

Thanks for the response.

I am now able to achieve the HashMap creation of size more than 100000 key value pairs, and also able to successfully serialize it using DeflatorOutputStream and the file size is ~2.5 MB.

But i am getting an OutOfMemory exception when i deserialize the map back into the JRE, using an InflaterInputStream.

Since i am storing it as a single whole object, the oos.readObject builds the whole object back in run time, which does'nt obviously use the intern() for the HashMap construction.

regards
mad
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Well, from what you've now told us, all the data from String3 and String4 could be easily compressed into about 4 bits for each calendar week. But we probably don't need to be that extreme. Just use something like this:

Then your original

Map<String[1], Map<String[2], Map><String[3], String[4]>>

becomes

Map<String[1], Map<String[2], CalendarWeekData>>
Madhan B Babu
Greenhorn

Joined: Nov 16, 2010
Posts: 5
Hi Mike

I had tried to have it in enum and build the Map with the object, but it results in a OutofMemory Exception, and that is when i switched over to multi-level Map with only Strings as Key and Values, and with the usage of .intern() , i am able to put more than 1 million entries into the Map.
To be very specific, there was a PermGen out of space exception , but only after 1.2 million entries....

Now the problem is with the deserialization, where it tries to deflate the whole Map and results in an OutOfMemory exception.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2969
    
    9
Well, I'd be interested in seeing exactly what you tried with the enums, because it really seems like that should work.

There are still many things to try, but it's hard to predict which will work best.

One possibility is, instead of writing the entire base HashMap to the file at once, write individual Map.Entry objects, one at a time. When you deserialize, create a new HashMap from scratch, and then read one Map.Entry at a time, and put its key and value into the HashMap. Perhaps breaking the process up this way will allow garbage collection to work more effectively.

If that doesn't work, you could break things up further by putting, say, 10000 Map.Entry objects in one file (using one ObjectOutputStream). Then do the same for the next 10000 entries, using a new OOS, and a new file. Repeat until all entries have been written. To read, reverse the process.

You also might add a readResolve() method (described in the Serializable API). To do this, you need a custom class to hold things in, like my CalendarWeekData. Maybe something like this (modified to use Strings rather than enums, since those were more successful so far). There are many ways to do this, but as long as we're doing it, we might as well limit the number of CalendarWeekData objects too. After all, there are only 9 different combinations of 3 different status strings with 3 different priority strings - so 9 different CalendarWeekData objects should be sufficient. (Here the CalendarWeekData objects should be immutable, to ensure they can be safely re-used.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: how to reduce runtime size of HashMap
 
Similar Threads
how to persist the contents of HashTable ??
Serialization Code from K&B Book
Serialization in K&B
java.io.StreamCorruptedException: invalid stream header:i understand its a thread issue not sure fix
Optimising size of HashMap in runtime