File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes How to manage a very very large list in memory? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to manage a very very large list in memory?" Watch "How to manage a very very large list in memory?" New topic
Author

How to manage a very very large list in memory?

S Dan
Greenhorn

Joined: Apr 05, 2005
Posts: 25
Let consider the following line

List<DataType> result = Compute();

result is a list of type DataType and could be very large.
I need to keep this in memory as I'm using this list in the
other parts of the code and iterating over it. My application
could create a very large list at times and causes out of
memory exception. Setting max heap size using -Xmx will not
help in extreme cases. Basically I need to come up
with a smart scheme to manage the list that causes minimum
refactoring in my code. Any suggestions? Thanks.
-Dan
Abhinav Srivastava
Ranch Hand

Joined: Nov 19, 2002
Posts: 349

If you can provide your own implementation of List, you can implement lazy-loading. Also, if this list is coming from a persistent store, you can use a scrollable cursor.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
[S Dan]: I need to keep this in memory as I'm using this list in the
other parts of the code and iterating over it.


Yet later you establish that you really can't keep the whole thing in memory.

It sounds like lazy loading isn't quite enough - the list also needs to be able to unload somehow, too. Otherwise the first time you iterate through the list, you'll fill the list and run out of memory. You could use a List<WeakReference<DataType>> internally I suppose, and reload as necessary.

I would probably start by implementing just an Iterator that loads from the database. If you're using JDK 5+, make an Iterable<DataType> for ease of use. Otherwise you can make a List but try making many of the oterations throw UnsupportedOperation - you may find that most of the clients who need the List just need it for iteration. Here I'm assuming that you can test the whole application at once before releasing your solution publicly. (Or use a usage search, available from any decent IDE.) If your clients are far away and you don't know just what they're doing with your List, you may upset people if you start throwing exceptions from previously-working methods.

If the database is slow, as is often the case, you may get better results by using seralization and writing the records to local files, then reading them. This is especially true if the database is across a slow network. Of course if users modify the list or its contents, that probably means you need to modify the database as well, along with any local files. This can get messy quickly. In that case using a scrollable cursor in the database may be your best bet, forgetting about the local files.

A more complex List implementation might initially just use an ArrayList to keep everything in memory. But when the size exceeds some limit, it would switch to a DB-backed or file-backed solution along the lines above. Then you can still get fast performance when the list is small enough, but you can handle larger data sets when necessary.
[ January 16, 2008: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
luc peuvrier
Greenhorn

Joined: Jul 13, 2008
Posts: 2
To solve the problem to manage huge collection without using database I use joafip
Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
You could try a RandomAccessFile if you develop a sensible storage format for your objects. Then you just need to seek to the correct place and decode the information into a temporary array. All this could be done in a List implementation to preserve compatibility with existing code - you might want to be clever too and cache a sensible number of entries into a local array for fast access (since disk access is slow). This basically is creating your own swap space on disk - you'll easily be able to store GBs of data in the file (on 64bit TBs in fact).


Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24166
    
  30

Originally posted by luc peuvrier:
To solve the problem to manage huge collection without using database I use joafip


Well, yes, that's not surprising, considering you're the project lead.


[Jess in Action][AskingGoodQuestions]
Darius Cooper
Greenhorn

Joined: Jul 10, 2008
Posts: 8
Originally posted by S Dan:
List<DataType> result = Compute();

Do you have control on the design of the "DataType" class? If so, is each object of that class potentially large? If so, are there some things you can do to make each DataType object small (at least on average)? If not, do you only need a certain subset of DataType in the list? If so, perhaps you can use a different class-design.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12675
    
    5
Does this list get modified after the initial creation?

Bill


Java Resources at www.wbrogden.com
Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
Replying to myself
You could try a RandomAccessFile if you develop a sensible storage format for your objects.
This would be best if DataType is Externalizable and the objects are stored back-to-back - then you know how large each one is since you write the storage format (making each serialization the same size would be ideal for seek purposes). That is fine if you can afford the overheads of deserializing and slow disk access times. Though I can't really see a way round this - your storage is either in memory or on disk if you really can't optimise your code.
luc peuvrier
Greenhorn

Joined: Jul 13, 2008
Posts: 2
Originally posted by Ernest Friedman-Hill:


Well, yes, that's not surprising, considering you're the project lead.


I spoke about joafip in this forum since I had to solve the problems of hug data model that can not be store in memory. It is based on RandomAccessFile with a Heap in file management layer, this may be enought to store variables size object in file. Joafip use an other upper layer that visit object graph for saving and proxy for lazy load.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to manage a very very large list in memory?
 
Similar Threads
Regular expressions, StringBuffers and OutOfMemoryErrors
session.clear when to use
generic return type
Understanding memory usage
ArrayList add method causing Memory Overrun