my dog learned polymorphism*
The moose likes I/O and Streams and the fly likes serializing large object (hashmap) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "serializing large object (hashmap)" Watch "serializing large object (hashmap)" New topic
Author

serializing large object (hashmap)

Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hello. If somebody has a minute, I seem to be going in circles on this one. I have a hashmap (referenced as data_map) which is a map of binary files, that i am serializing. However, when I load test it with larger files, i run out of java heap space...

ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(data_map);

i've been trying to go about chunking the hashmap (similar to say, an outputstream for an http connection) into the ObjectOutputStream but it all comes back to the inability to read/write the object to any stream and/or convert it to bytes. Thank you for any input. I'll keep going at it or whatever.
[ August 22, 2007: Message edited by: Tom Griffith ]
Joseph Kampf
Greenhorn

Joined: Mar 04, 2004
Posts: 26
What you are doing is essentially creating a duplicate of your large map in memory. And it is even worse than that, because serialized version of this map is a lot larger than how the map is represented on the heap.

What are you doing with the ByteOutputStream after you are done? Writing it to a blob? A file? Sending it over the network? If so, I would skip the middleman of the ByteOutputStream and send it directly to the destination OutputStream.



Originally posted by Tom Griffith:
Hello. If somebody has a minute, I seem to be going in crocles on this one. I have a hashmap (referenced as data_map) that i am serializing, however, when I load test it, i run out of java heap space...

ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(data_map);

i've been trying to go about maybe chunking the hashmap into the ObjectOutputStream but it all comes back to the inability to read/write the object as is to any stream (to convert it to bytes). I appreciate any help and input. Thank you.

[ August 22, 2007: Message edited by: Tom Griffith ]
Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hi...thank you for reading my post or whatever. Yeah, what I am doing is using the ByteArrayOutputStream to convert the map to a byte array and then writing the resulting byte array in chunks (to avoid a memory heap problem there) over the network. I guess the middleman's purpose is to bridge the hashmap object with a byte array...

ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(data_map);
byte[] buf = bos.toByteArray();

Is there another way I can convert the hashmap ~object~ directly to a byte array without going through the ByteArrayOutputStream (and ObjectOutputStream)? thank you again...
[ August 22, 2007: Message edited by: Tom Griffith ]
Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hello. What I am able to get working, although the performance is not great, is to use writeObject on the hashmap and stream it to a local temporary dat file. This replaces the ByteArrayOutputStream middleman in memory by offloading the bytes...then I set an InputStream on the dat file and chunk the bytes across the network. I really don't see another way. Thank you again for your valuable input, it made me look at it with an eye on eliminating the redundant stream. Any additional input on doing this more efficently than a temp dat file would be appreciated, but i think i've exhausted all avenues. Thank you for reading this everybody.
Dilraj Singh
Greenhorn

Joined: Apr 22, 2007
Posts: 8
Well you can do something with the objects you have stored in the hashtable,if you can use custom serialization i.e. writing only those properties which are useful to you while not writing every field of that object to the object stream. or by implementing Externalizable??

i could understand, if I will have following information:

which application server you are using ,if any? are you using java message services queues?
from where you get this hashtable with file objects in it?

regards,
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
Even if you do want to write out all fields, you can still make significant savings in serialised size, by using custom serialisation. This is because default serialisation writes out stuff like field names, Java class names etc, to the stream. That stuff can easily take up more space than the actual data! In custom serialisation, you can just write the data. Take care though, because this reduces the chance of successfully reading the data from an old build into a new build.


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
What are the keys and values in your HashMap? It sounds like the values might be large arrays of bytes from binary files - is that the case?


"I'm not back." - Bill Harding, Twister
Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hello. Thank you for reading everybody. Yeah, the map uses file names as keys and the respective binary array as the value...then at the http destination, ultimately streaming each value object (byte array) to a new file object using the key value (file name). I wanted to use the map in order to allow for > 1 files to be transferred in a single call (although the large files, ie. large maps, will require a chunking loop to stream to the http destination)...

i'm going to look at custom serialization but i'm not so sure that applies because i need both the key and value from the map...
[ August 23, 2007: Message edited by: Tom Griffith ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Hm, I think chances are good that most of the time is being spent transmitting large arrays of bytes. Custom serialization will probably not help much in this case, since a byte array is pretty easy to serialize. It's just the volume of bytes that's the problem.

I think that the reason you need to break this into chunks is because of the way ObjectOutputStream and ObjectInputStream work. They both keep internal maps of all the objects that have been written through them, which is necessary so they can detect references to already-written objects and represent these with references to the already-written objects, rather than serializing new copies of those objects. But your client doesn't need or want to have all your Map's contents in memory at once. So you've had to break the map into chunks. Does that sound right?

From what you've described, I think you might be best off not using serialization at all, and instead use a simple protocol with DataOutputStream and DataInputStream. E.g. the server could do something like this:

And the client could do something like this:

Here it's important that whatever doSomethingWith() does, it should avoid saving any reference to the byte[] array. That way each one can be collected when you're done with it, and the required memory is only a little bigger than the largest single file you transfer.
Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hi Jim. It still seems to pose the same problem of how to set the inputstream (and subsequently, the DataInputStream) on the map object. I still think i would have to convert the map to bytes first in order to stream it. Is that right?...
[ August 24, 2007: Message edited by: Tom Griffith ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I don't think I understand the question. That code does convert all the data in the map into bytes. The call to dos.writeUTF() converts the file names to bytes, and the byte arrays are already in bytes. I don't know what "how to set the inputstream (and subsequently, the DataInputStream) on the map object" means.
Tom Griffith
Ranch Hand

Joined: Aug 06, 2004
Posts: 257
Hi Jim. I think I'm the one thats kinda confused. I'll really mess with integrating data streams into this today and see how it pans out. I've used them before (in primitive times) to force xml into services. Thank you.
[ August 27, 2007: Message edited by: Tom Griffith ]
 
jQuery in Action, 2nd edition
 
subject: serializing large object (hashmap)
 
Similar Threads
How to do the encoding and decoding in Java?
Efficient way of writing objects to byte array.
Object Transformer serialize problem
size of the object in session
Convert an Object to ByteArray