wood burning stoves 2.0*
The moose likes Java in General and the fly likes File backed collection Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "File backed collection" Watch "File backed collection" New topic

File backed collection

Raees Uzhunnan
Ranch Hand

Joined: Aug 15, 2002
Posts: 126
I am looking for a java collection like API that is backed by a file .. that can be retrieved after a crash and rebuild in memory . Let me know if you have come across something like this


Sun Certified Enterprise Architect
Java Technology Blog
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Interesting idea. I guess you could implement Collection over a database, or maybe use an object database. What have you considered?

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Raees Uzhunnan
Ranch Hand

Joined: Aug 15, 2002
Posts: 126
I want a solution that performs in microseconds. the java.io.RandomAccessFile API is what I am thinking of using; reads and writes are in microseconds. Database operations are expensive in milliseconds

Nitesh Kant

Joined: Feb 25, 2007
Posts: 1638

Is the only requirement that the collection should survive a crash?
Why not serialize the collection and read it once at load?
Probably you can use Externalizable to modify how you serialize your collection.
Is it so that some other process can also modify the file? If not, then every read should not translate to an I/O operation(Read once and cache), whereas every write should be.
Does this help.

apigee, a better way to API!
Joe Ess

Joined: Oct 29, 2001
Posts: 8836

Using RandomAccessFile to build a database-like structure is well traveled ground.
There are also a number of fast embedded databases like Berkeley DB that may fit the bill.

"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
That article Joe linked on indexed files is neat. Reminds me of mainframe VSAM, but there I think the index was distributed in the, um, Control Areas and Control Intervals. Puts could be very fast except when one of those areas got full and required a split. I still have some routines that compute optimal free space based on record sizes, insert rates and mean time between reorg.

I'd worry about schemes like this having occasional performance hits that could make things feel uneven. And it won't be long before your code is complex enough to be slower than a real database.

Is there any opportunity to update memory in real time and update the file store asynchronously? If recovery from file is rare, say only after an uncommon disaster, consider a transaction log file that you could re-apply to the last good backup. That could be a very fast append-only write.
Bill Shirley
Ranch Hand

Joined: Nov 08, 2007
Posts: 457
What are the other requirements on the collection?
Is it likely to be 10 items or 10,000 items? Or 10,000,000,000?
Will the contents be homogeneous or heterogeneous?
Will the items in the collection be changed often?
Will others access the backing file? For update or read? Concurrently?

More details are needed to make your implementation decision.

Bill Shirley - bshirley - frazerbilt.com
if (Posts < 30) you.read( JavaRanchFAQ);
Paul Sturrock

Joined: Apr 14, 2004
Posts: 10336

Have you considered one of the many caching frameworks out there? JCS for example is (at is most basic) a Map that is spooled to disc.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch
Raees Uzhunnan
Ranch Hand

Joined: Aug 15, 2002
Posts: 126
Answering many questions in this post. Yes we used tangosol caching product. It is excellent for in memory caching; but not so good while synchronizing to disk. I will look at the JCS product.

The next answer is that we want write up to 10M to a file. not more than that cos of the performance hit we have observed on 10M or more sized files. with 100 bytes per record. this will hold aprox 100,000 recs.

No the content of a record will not be updated ; but insert and delete of records will be occurring for 100% for the records. the content will be homogeneous; and other threads in the JVM should be able to access the content ( need to be sync) .

yes we also don't want to go beyond this 100,000 per collection on a file. As stated earlier this file based collection is an intermediate storage for records until they get pushed to a database. This arrangement is merely for performance/latency when we have bursts of records coming in. A series threads will read this collection and persist to database and remove from the collection

If the events exceed this number; then we will bypass directly to db taking the performance hit. I believe this won't happen ever cos .. we will have enough threads that reads from the file and batch updates to the database

[ December 18, 2007: Message edited by: Raees Uzhunnan ]
[ December 18, 2007: Message edited by: Raees Uzhunnan ]
Ilja Preuss

Joined: Jul 11, 2001
Posts: 14112
You might want to take a look at http://www.prevayler.org for an alternative persistence solution. I don't have any experience with it, but it sounds interesting, and like something that at least might give you some new ideas.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
I agree. Here's the link: http://aspose.com/file-tools
subject: File backed collection
Similar Threads
A concurrent, thread-safe alternative/complement to collections
To extend or not to extend AbstractCollection
Backed collection
Generics speed
Does it cost problem?