File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes I/O and Streams and the fly likes Searching a large file written by ObjectOutputStream Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Searching a large file written by ObjectOutputStream" Watch "Searching a large file written by ObjectOutputStream" New topic

Searching a large file written by ObjectOutputStream

Barry Andrews
Ranch Hand

Joined: Sep 05, 2000
Posts: 523

Hi Folks,
I have a problem with large files and I was wondering if anyone here could give some suggestions for a solution.

I have a very large file that contains possibly several thousand objects. They are serialized using ObjectOutputStream. The file could be as much as 1GB or greater! Each object is also quite large, possibly 5000KB per object. If I want to find an object in this file, I read each object until I find the one I need throwing the other objects away as soon as they are read. Obviously I cannot keep anything in memory as that would bring down the machine.

So I was wondering if it's possible to somehow keep a pointer in the file so that I can search faster. Using some other serialization is not an option. It must be done with ObjectOutputStream, so I am assuming that it must be read back using ObjectInputStream. But maybe there is another way?

One option is to split the files up into smaller chunks. Then if I know that I need object 101, I can go to the file that I know contains object 101. But still I have to read each object untit I get to object 101. I would rather keep just one file, but if I cannot find another option I will have to do this.

Does anyone have any suggestions?

many thanks,

akalanka de silva

Joined: Sep 14, 2004
Posts: 18
hey send u r coding i think there has a logical error
Barry Andrews
Ranch Hand

Joined: Sep 05, 2000
Posts: 523

Logical error? There is no error. It is very simple. I use ObjectOutputStream to write and ObjectInputStream to read. Try to read 1000 large objects from a large file. It will take some time.

I guess what I am looking for is ideas on using something else besides ObjectInputStream to read the objects back. Some way to quickly move a pointer to a certain position in the file so I don't have to read each object to get to the object I want. I am thinking of RandomAccessFile, but I am not sure about where to the place the pointer, i.e. where does one object end and another begin. Then there is constructing an object out of the bytes read, because I would no longer have the convenience of readObject().

Hopefully it is clear what I am trying to do? Any ideas?


Joe Ess

Joined: Oct 29, 2001
Posts: 9189

This is a design problem. ObjectOutputStream is provided simply for saving an object's state. It doesn't have the functionality which matches your requirements. If you can't change the storage mechanism you are stuck. Now, if you can change the storage mechanism, say serializing objects to a RandomAccessFile, SQL database or pure object database, we could talk about some options.

[How To Ask Questions On JavaRanch]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Goin over my head here ... extend FileOutputStream, override write to count the bytes it writes. Write an object, get the count, index of next object is count+1, write an object, get the count, etc. Then can you seek or skip bytes when it's time read back in?

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Barry Andrews
Ranch Hand

Joined: Sep 05, 2000
Posts: 523

hmmmmm.... Interesting. I will have to look into that.

Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
An important question here is: do these objects contain references to one another? If they do, you will have a very, very hard time trying to read the serialized objects out of sequence. Even if they don't contain such references to each other - there's a very high possibility this will never work. Your objects may well refer to other objects which are shared. E.g. if you have a class with a String field, and two instances of your class refer to the same String - you will have a very hard time reading the second instance unless you read the first instance first. Because the shared string will get serialized as part of the first instance, and the second instance will just serialize a reference to the first one. That's a rather imprecise description, and I'm not sure of all the details myself, but I think it's extremely unlikely you will be able to achieve what you're asking for. The object serialization protocol is very much designed for sequential access, period. Skipping steps is not really an option for objects which were all serialized together using the same ObjectOutputStream.

I think that by far, your best option is the one mentioned in the fourth paragraph of your first post here. Read the entire huge file once, and write a separate file for each object. Yes, it will take some time - but you only have to do it once. Then you never use the huge file again.

Note that if your objects do contain extensive references to each other (not just to a few small shared objects like Strings) then this option won't really work either, as trying to serialize one object will end up serializing them all. In which case forget about trying to write separate files - each one will be as big as the original. You will just have to read the entire file and keep everything in memory. Or find some other way of storing your data besides object serialization.

"I'm not back." - Bill Harding, Twister
I agree. Here's the link:
subject: Searching a large file written by ObjectOutputStream
It's not a secret anymore!