This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Concurrent file I/O & locking Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Concurrent file I/O & locking" Watch "Concurrent file I/O & locking" New topic
Author

Concurrent file I/O & locking

David Goate
Greenhorn

Joined: Jan 08, 2013
Posts: 7
Hi everyone,

Firstly, I'd like to thank everyone for the effort and time they put into replying to posts in this forum - I have already found lots of useful advice and tips on the OCMJD course.

This is my first post and I'm hoping that someone can give me advice surrounding the concurrency requirements. I am currently implementing the Bodgitt & Scarper assignment and I have some questions about the implementation of the suncertify.db.Data class.

I have read that some people have taken an approach which involves loading the entire database contents into memory when the application starts and then operating entirely with the data in memory until application shutdown, at which point the in-memory content is written back to disk to be persisted in the database file. There were two main factors which put me off using this approach:
1) If for any reason the application closed unexpectedly (e.g. power cut, application crash, JVM crash, OS crash etc...) then the data held in memory would be lost.
2) Although the database size starts very small we have no indication of the projected growth and so reading the entire file into memory may require large heap sizes as the DB grows.

I appreciate that point 1 could be overcome several ways, such as frequently writing out the in memory state to disk at some interval, or keeping a separate file to act as a kind of modification log that could be replayed in the event of a failure. But this seems a little over the top given the requirements and time scales involved.

As a result, I took the decision to implement each method of the required interface e.g. read, update, delete, find, create and so on, with a method local java.io.RandomAccessFile instance. In each method I create a new instance with the appropriate mode. For example, in method that only read data I use mode "r" and in methods that modify data I decided to use mode "rwd" to ensure that writes happen synchronously (although I'm not entirely convinced that I have used the correct mode at this time). Since each method has its own instance of the java.io.RandomAccessFile there is no chance that two threads will interfere with each other for example overlapping seeks and reads and so on,.Furthermore, I guard access to the file with a global java.util.concurrent.locks.ReentrantReadWriteLock.

Any method that just reads data first acquires a read lock and any method that mutates data acquires the write lock. My thinking was that, at least in the theory of my mind, multiple threads could be concurrently reading records and it would only be when someone writes to the file that an exclusive write lock would be held. But now I question whether concurrent file reads from different locations in the file is even possible at the JVM/OS/Hardware level. If it is not possible to concurrently read/write the file and access is effectively serial anyway then my read/write lock solution seems redundant.

This approach seems to also have some drawbacks:
1) Since the file is being guarded by a java.util.concurrent.locks.ReentrantReadWriteLock all updates, deletes and creates result in the entire file being locked temporarily which could really hurt the scalability and throughput of my application as it would not even be possible to read/update/delete a completely different record to the one being updated.
2) Since each operation uses its own instance of the random access file, if many threads are running the application may hit the limit of open file descriptors.
3) Performance could be degraded by the overhead of opening a new file descriptor each time data is accessed.

My main priority here is to above all else ensure data consistency and integrity. So in that respect the concurrent and performance are a secondary concern. The marking criteria don't seem to explicitly offer any points for performance, scalability, concurrency etc, but I presume that some marks may be allocated in other sections such as general considerations.
I am aware that there are requirements for row locking in the interface for updating and deleting but in this post I am more concerned about locks to protect the data.

Does anyone have any advice about this or other approaches. Did anyone else pass with a good score using the approach outlined above?

Thanks in advance for your time, effort and knowledge.
Himai Minh
Ranch Hand

Joined: Jul 29, 2012
Posts: 737
To make sure data integrity, use this logic flow : lock, update/delete, unlock. One thread locks a record and makes other threads wait for that record. The thread updates/delete the record. When the thread finishes, it wakes up all waiting threads.
Refer to Roberto's test class.

Inside update/delete method, if you choose to use RandomAccessFile to update the database or cache, use synchronized keyword to make sure only one thread access the database/cache.

You can read Apress SCJD 5.0 Study Guide for reference.


Regard to loading database data:

) If for any reason the application closed unexpectedly (e.g. power cut, application crash, JVM crash, OS crash etc...) then the data held in memory would be lost.
2) Although the database size starts very small we have no indication of the projected growth and so reading the entire file into memory may require large heap sizes as the DB grows.

You can choose either one. Performance is not a requirement of the exam as I remember. But no deadlock is required.

Dennis Grimbergen
Ranch Hand

Joined: Nov 04, 2009
Posts: 140

I chose to use a record cache, because it simplifies and increases maintainability of the source code. Furthermore, often seeking for positions in a database file and then writing data back may result in undeterminate behaviour (e.g. when your position in the file to write is 1 off).
I used a shutdown hook that makes sure that my data is written back to file at application exit (in a normal way). If something bad happens, like a power failure, then (changes to the) data may be lost. You may consider to spend an extra Thread for writing data more often. However, you can also just tell in your choices.txt that this may be a future enhancement of the application. There are no requirements (as far as I know) about data integrity.


SCJP, SCWCD, SCJD
Darren Dimaapi
Greenhorn

Joined: Jan 28, 2010
Posts: 2
Hi David,

Since scalability and performance is not a requirement, like you, I chose to create a new RandomAccessFile
for each of the crud methods, and modified the database file
directly (record updates/deletes). I do clean up the "RAF" instance before each method ends.
It may not be efficient but in this assignment I was after the simplicity of the approach.

I tried writing my db class having an instance variable of "RAF" the first time, but I realized it was leading to a complex code. One thing is that I have to expose a
cleanup method to clients of db class to close down "RAF" resources on shutdown. So I stick to keeping the RAF local and preserving the db interface
by not exposing any additional db methods to the client.

With regards to data integrity, I followed the popular route of making all the db's public methods
marked as synchronized, and making sure that only one instance of the db exists.
This way, only one thread is allowed to touch the database file, through the db instance, for any of the crud operations at any given time (one RAF instance at a time).
So for the one who started and popularized this approach many thanks to you for coming up with such a simple and fitting solution

Btw, I passed March last year

Have fun!
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5216
    
  12

First of all, a warm welcome to the JavaRanch!

I followed the same approach as Dennis described.

And just like he already desribed you can add in choices.txt remarks about future application improvements without having to implement them. So for the 2nd problem you mention (DB growth) you could opt to just load the relevant data (e.g. no need to load data from contractors available in the past, you can't book them anymore)


SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
http://www.javaroe.be/
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5216
    
  12

Darren Dimaapi wrote:I tried writing my db class having an instance variable of "RAF" the first time, but I realized it was leading to a complex code.

I don't understand why the code using an instance RAF variable is more complex than using each time a local RAF variable. The only thing different is that you have to provide a shutdown hook to close the instance RAF variable, but I would not call that complex code...
Darren Dimaapi
Greenhorn

Joined: Jan 28, 2010
Posts: 2
Hi Roel,

First of all, I would like to give my sincerest acknowledgements since
I've noticed that I haven't given any.

I know its too late but let me thank all of you who contributed to the scjd forum,
especially to our bartenders you Roel and Robert Surely, it
would be very hard for me without your inputs.

With regards to your comment I absolutely agree. Adding a hook is not complex and perhaps the most efficient way.
When I was writing my db class my goal was to make it an easy read for a junior developer,
since it's a requirement of the assignment that I got.
so I chose a straightforward looking code whenever possible,
and I thought option 2 below would be more of an easy read since
everything is revealed at first sight:

1.

new single instance RAF +
do crud +
shutdownhook clean up in client code on exit


vs

2.

new raf before each of the crud operation +
do crud +
clean up raf after each of the crud operation



but that's just me, and it is really subjective.

kind regards.
David Goate
Greenhorn

Joined: Jan 08, 2013
Posts: 7
Thanks for all the replies everyone, this is all very useful for me.

It seems that my approach is on the right track but I've made my data access locking a little more complex than required. Based on the information here and after weighing up for of the pros and cons of different approaches I think I'm going to continue with my approach for using a RAF for each operation and I will use a global lock to control reads and writes to and from the file. I'll let you know how I get on - I'll probably end up posting a few more times before I am finished with my assignment ;)
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5216
    
  12

Darren Dimaapi wrote:I know its too late but let me thank all of you who contributed to the scjd forum,
especially to our bartenders you Roel and Robert Surely, it
would be very hard for me without your inputs.

It's never too late!

And it's always nice to see that our help (and hard work ) is appreciated.
 
Don't get me started about those stupid light bulbs.
 
subject: Concurrent file I/O & locking