• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Searching a large file written by ObjectOutputStream

 
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Folks,
I have a problem with large files and I was wondering if anyone here could give some suggestions for a solution.

I have a very large file that contains possibly several thousand objects. They are serialized using ObjectOutputStream. The file could be as much as 1GB or greater! Each object is also quite large, possibly 5000KB per object. If I want to find an object in this file, I read each object until I find the one I need throwing the other objects away as soon as they are read. Obviously I cannot keep anything in memory as that would bring down the machine.

So I was wondering if it's possible to somehow keep a pointer in the file so that I can search faster. Using some other serialization is not an option. It must be done with ObjectOutputStream, so I am assuming that it must be read back using ObjectInputStream. But maybe there is another way?

One option is to split the files up into smaller chunks. Then if I know that I need object 101, I can go to the file that I know contains object 101. But still I have to read each object untit I get to object 101. I would rather keep just one file, but if I cannot find another option I will have to do this.

Does anyone have any suggestions?

many thanks,

B
 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hey send u r coding i think there has a logical error
 
Barry Andrews
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Logical error? There is no error. It is very simple. I use ObjectOutputStream to write and ObjectInputStream to read. Try to read 1000 large objects from a large file. It will take some time.

I guess what I am looking for is ideas on using something else besides ObjectInputStream to read the objects back. Some way to quickly move a pointer to a certain position in the file so I don't have to read each object to get to the object I want. I am thinking of RandomAccessFile, but I am not sure about where to the place the pointer, i.e. where does one object end and another begin. Then there is constructing an object out of the bytes read, because I would no longer have the convenience of readObject().

Hopefully it is clear what I am trying to do? Any ideas?


thanks,

B
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is a design problem. ObjectOutputStream is provided simply for saving an object's state. It doesn't have the functionality which matches your requirements. If you can't change the storage mechanism you are stuck. Now, if you can change the storage mechanism, say serializing objects to a RandomAccessFile, SQL database or pure object database, we could talk about some options.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Goin over my head here ... extend FileOutputStream, override write to count the bytes it writes. Write an object, get the count, index of next object is count+1, write an object, get the count, etc. Then can you seek or skip bytes when it's time read back in?
 
Barry Andrews
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hmmmmm.... Interesting. I will have to look into that.

Thanks!
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
An important question here is: do these objects contain references to one another? If they do, you will have a very, very hard time trying to read the serialized objects out of sequence. Even if they don't contain such references to each other - there's a very high possibility this will never work. Your objects may well refer to other objects which are shared. E.g. if you have a class with a String field, and two instances of your class refer to the same String - you will have a very hard time reading the second instance unless you read the first instance first. Because the shared string will get serialized as part of the first instance, and the second instance will just serialize a reference to the first one. That's a rather imprecise description, and I'm not sure of all the details myself, but I think it's extremely unlikely you will be able to achieve what you're asking for. The object serialization protocol is very much designed for sequential access, period. Skipping steps is not really an option for objects which were all serialized together using the same ObjectOutputStream.

I think that by far, your best option is the one mentioned in the fourth paragraph of your first post here. Read the entire huge file once, and write a separate file for each object. Yes, it will take some time - but you only have to do it once. Then you never use the huge file again.

Note that if your objects do contain extensive references to each other (not just to a few small shared objects like Strings) then this option won't really work either, as trying to serialize one object will end up serializing them all. In which case forget about trying to write separate files - each one will be as big as the original. You will just have to read the entire file and keep everything in memory. Or find some other way of storing your data besides object serialization.
 
reply
    Bookmark Topic Watch Topic
  • New Topic