result is a list of type DataType and could be very large. I need to keep this in memory as I'm using this list in the other parts of the code and iterating over it. My application could create a very large list at times and causes out of memory exception. Setting max heap size using -Xmx will not help in extreme cases. Basically I need to come up with a smart scheme to manage the list that causes minimum refactoring in my code. Any suggestions? Thanks. -Dan
[S Dan]: I need to keep this in memory as I'm using this list in the other parts of the code and iterating over it.
Yet later you establish that you really can't keep the whole thing in memory.
It sounds like lazy loading isn't quite enough - the list also needs to be able to unload somehow, too. Otherwise the first time you iterate through the list, you'll fill the list and run out of memory. You could use a List<WeakReference<DataType>> internally I suppose, and reload as necessary.
I would probably start by implementing just an Iterator that loads from the database. If you're using JDK 5+, make an Iterable<DataType> for ease of use. Otherwise you can make a List but try making many of the oterations throw UnsupportedOperation - you may find that most of the clients who need the List just need it for iteration. Here I'm assuming that you can test the whole application at once before releasing your solution publicly. (Or use a usage search, available from any decent IDE.) If your clients are far away and you don't know just what they're doing with your List, you may upset people if you start throwing exceptions from previously-working methods.
If the database is slow, as is often the case, you may get better results by using seralization and writing the records to local files, then reading them. This is especially true if the database is across a slow network. Of course if users modify the list or its contents, that probably means you need to modify the database as well, along with any local files. This can get messy quickly. In that case using a scrollable cursor in the database may be your best bet, forgetting about the local files.
A more complex List implementation might initially just use an ArrayList to keep everything in memory. But when the size exceeds some limit, it would switch to a DB-backed or file-backed solution along the lines above. Then you can still get fast performance when the list is small enough, but you can handle larger data sets when necessary. [ January 16, 2008: Message edited by: Jim Yingst ]
You could try a RandomAccessFile if you develop a sensible storage format for your objects. Then you just need to seek to the correct place and decode the information into a temporary array. All this could be done in a List implementation to preserve compatibility with existing code - you might want to be clever too and cache a sensible number of entries into a local array for fast access (since disk access is slow). This basically is creating your own swap space on disk - you'll easily be able to store GBs of data in the file (on 64bit TBs in fact).
Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / AmazonAmazon UK )
Originally posted by S Dan: List<DataType> result = Compute();
Do you have control on the design of the "DataType" class? If so, is each object of that class potentially large? If so, are there some things you can do to make each DataType object small (at least on average)? If not, do you only need a certain subset of DataType in the list? If so, perhaps you can use a different class-design.
Does this list get modified after the initial creation?
Joined: Mar 27, 2003
Replying to myself
You could try a RandomAccessFile if you develop a sensible storage format for your objects.
This would be best if DataType is Externalizable and the objects are stored back-to-back - then you know how large each one is since you write the storage format (making each serialization the same size would be ideal for seek purposes). That is fine if you can afford the overheads of deserializing and slow disk access times. Though I can't really see a way round this - your storage is either in memory or on disk if you really can't optimise your code.
Joined: Jul 13, 2008
Originally posted by Ernest Friedman-Hill:
Well, yes, that's not surprising, considering you're the project lead.
I spoke about joafip in this forum since I had to solve the problems of hug data model that can not be store in memory. It is based on RandomAccessFile with a Heap in file management layer, this may be enought to store variables size object in file. Joafip use an other upper layer that visit object graph for saving and proxy for lazy load.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: How to manage a very very large list in memory?