File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

File backed collection

 
Raees Uzhunnan
Ranch Hand
Posts: 126
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am looking for a java collection like API that is backed by a file .. that can be retrieved after a crash and rebuild in memory . Let me know if you have come across something like this

Thanks
Raees
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Interesting idea. I guess you could implement Collection over a database, or maybe use an object database. What have you considered?
 
Raees Uzhunnan
Ranch Hand
Posts: 126
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want a solution that performs in microseconds. the java.io.RandomAccessFile API is what I am thinking of using; reads and writes are in microseconds. Database operations are expensive in milliseconds

Thanks
 
Nitesh Kant
Bartender
Posts: 1638
IntelliJ IDE Java MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the only requirement that the collection should survive a crash?
Why not serialize the collection and read it once at load?
Probably you can use Externalizable to modify how you serialize your collection.
Is it so that some other process can also modify the file? If not, then every read should not translate to an I/O operation(Read once and cache), whereas every write should be.
Does this help.
 
Joe Ess
Bartender
Posts: 9214
9
Linux Mac OS X Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using RandomAccessFile to build a database-like structure is well traveled ground.
There are also a number of fast embedded databases like Berkeley DB that may fit the bill.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That article Joe linked on indexed files is neat. Reminds me of mainframe VSAM, but there I think the index was distributed in the, um, Control Areas and Control Intervals. Puts could be very fast except when one of those areas got full and required a split. I still have some routines that compute optimal free space based on record sizes, insert rates and mean time between reorg.

I'd worry about schemes like this having occasional performance hits that could make things feel uneven. And it won't be long before your code is complex enough to be slower than a real database.

Is there any opportunity to update memory in real time and update the file store asynchronously? If recovery from file is rare, say only after an uncommon disaster, consider a transaction log file that you could re-apply to the last good backup. That could be a very fast append-only write.
 
Bill Shirley
Ranch Hand
Posts: 457
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Raees,
What are the other requirements on the collection?
Is it likely to be 10 items or 10,000 items? Or 10,000,000,000?
Will the contents be homogeneous or heterogeneous?
Will the items in the collection be changed often?
Will others access the backing file? For update or read? Concurrently?

More details are needed to make your implementation decision.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you considered one of the many caching frameworks out there? JCS for example is (at is most basic) a Map that is spooled to disc.
 
Raees Uzhunnan
Ranch Hand
Posts: 126
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Answering many questions in this post. Yes we used tangosol caching product. It is excellent for in memory caching; but not so good while synchronizing to disk. I will look at the JCS product.

The next answer is that we want write up to 10M to a file. not more than that cos of the performance hit we have observed on 10M or more sized files. with 100 bytes per record. this will hold aprox 100,000 recs.

No the content of a record will not be updated ; but insert and delete of records will be occurring for 100% for the records. the content will be homogeneous; and other threads in the JVM should be able to access the content ( need to be sync) .

yes we also don't want to go beyond this 100,000 per collection on a file. As stated earlier this file based collection is an intermediate storage for records until they get pushed to a database. This arrangement is merely for performance/latency when we have bursts of records coming in. A series threads will read this collection and persist to database and remove from the collection


If the events exceed this number; then we will bypass directly to db taking the performance hit. I believe this won't happen ever cos .. we will have enough threads that reads from the file and batch updates to the database

[ December 18, 2007: Message edited by: Raees Uzhunnan ]
[ December 18, 2007: Message edited by: Raees Uzhunnan ]
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might want to take a look at http://www.prevayler.org for an alternative persistence solution. I don't have any experience with it, but it sounds interesting, and like something that at least might give you some new ideas.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic