aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes single raf vs. raf per client Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "single raf vs. raf per client" Watch "single raf vs. raf per client" New topic
Author

single raf vs. raf per client

Gytis Jakutonis
Ranch Hand

Joined: Feb 02, 2004
Posts: 76
Hello,
while investigating single vs. multiple rafs option, I have some doubts on the latter one. Isn't each raf associated with file handle? If so, then multiple rafs solution can fail if underlying OS reaches file handles limit.
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Gytis,
while investigating single vs. multiple rafs option, I have some doubts on the latter one. Isn't each raf associated with file handle? If so, then multiple rafs solution can fail if underlying OS reaches file handles limit.

I think that most of the people here use a single (static?) raf instance.
Before I stopped my assignment, I used NIO. When I'll start it again (within the next few weeks and from scratch for the most part of it I think), I'll use IO. So I thought to that issue already and the solution I have in mind is based on RAF connections pooling.
I see it as the best of both worlds:
- multiple RAFs for maximum throughput;
- maximum number of RAFs instances under control (property).
Regards,
Phil.
Gytis Jakutonis
Ranch Hand

Joined: Feb 02, 2004
Posts: 76
Hi Philippe,
raf pool seems to be quite interesting alternative to NIO. And if we consider that NIO is prohibited, this approach is the only alternative to FileChannels which can support concurent reads. But with this raf pool(or raf per client) we have to solve 'physical' synchronization problem(prohibit read if another thread is writing into the same record). So considering code complexity and performance winning I'm not so sure if multiple rafs are better than singe raf with fully synchronized reads and writes(without concurent read support). Still I'm curious how such 'physical' synchronization can be achieved:
1. multiple reads one write synchronization(i.e. prohibit any read if write is in progress and vice versa)
2. synchronization on single record(prohibit any operation if other thread is working with that record or prohibit only elected operations like read vs. write and so on)
With both of these approaches I'm not sure about how to synchronize search(), create() and delete(). Seems like delete() is like write(), but others are not so clear(should we restrict any writes() while performing search() etc.)
Min Huang
Ranch Hand

Joined: Mar 17, 2004
Posts: 100
im going with multiple rafs. could somebody comment on my scheme?
i have a hashmap of recordnumber => lockcookie in my Data class, which is a singleton. i synchronize on this hashmap in my lock/unlock method.
no methods are synchronized except for create. u have to lock a record before writing/deleting, and delete automatically unlocks the record so you cannot own a lock on a deleted record.
it's the client's choice whether or not to call read without locking the record first.
whenever a call is made to a method that modifies the file, a new raf is created, opened, used, and destroyed when the method terminates.
find locks a record before reading it, reads it, unlocks it, and goes on to the next one.
create looks for the first deleted record and starts writing record data into that spot, then it sets the record to valid. create is synchronized so only one create can go on at any one time.


SCJP 1.4, SCJD 1.4, SCBCD (Preparing!)
Vitali Chalov
Greenhorn

Joined: Feb 01, 2004
Posts: 6
Hi guys,
I have been thinking about those exact issues for a while and completely
So, just want to throw into the discussion another question rather then an answer.
The design with a single raf looks more appealing since it is much simpler (IMO anyway). However, it requires synchronization on the raf instance. And this is exactly what bothers me here: when I synchronize on this raf instance, then all that record-locking mechanism becomes quite useless, doesn't it? Unless, of course, somebody wants to use pessimistic locking when a lock is acquired for a considerable period. But when lock is required immediately before update/delete operation and released immediately after, especially in so-called 3-tier design when locking mechanism is not exposed to client, it does not make any sense to me. Any comments please?
Vitali Chalov
Greenhorn

Joined: Feb 01, 2004
Posts: 6
Hi Min,
Have not found anything wrong with your design. But it is really hard to say if everything is okay because I personally think that implementing such design is quite a task.
You do realize that when you synchronize only your create method, it means that you do not allow more than one thread to create a new record at a time but other operations are allowable, right? So, when t1 thread is creating a new record, t2 thread is permitted to perform the search operation. Then it depends on the implementation � the search would need a file length but it can be changed by create method at any moment, so accuracy and careful analysis is needed here. I find it quite overwhelming and time consuming for this project and this is why I have started looking again back to more simple design with just single raf (see my previous post). What do you think?
As for your idea about giving a choice to lock the record or not before reading it � it is an interesting idea. I still thinking about it. It might resolve the issue with allowing dirty reads or not. Quite interesting.
Regards,
Vitali.
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11509
    
  95

Hi Vitali,
When I synchronize on this raf instance, then all that record-locking mechanism becomes quite useless, doesn't it?

Not at all. The synchronization on the RAF and the logical record locking have quite different purposes.
You are right in that you need to synchronize on the RAF to ensure that only one thread is accessing the file at any given time.
But booking a record has a totally separate issue: ensuring that only one client books the record. This requires your booking process to check that the record is still available to be booked, and then book it. Since there are two steps involved here, you need some way of making sure that no other thread can book your record in between those two steps.
To make this a bit clearer, we are talking about two threads (A and B) both trying to book the same record:
  • Thread A checks that the record is still available
  • Thread B checks that the record is still available
  • Thread A books the record
  • Thread B books the record



  • Now you could avoid that by putting the booking code inside a synchronized block. However:
  • This reduces concurrent access to your database, as only one client at a time can be doing bookings
  • This assumes your booking method is in the database server logic. If your booking method is in the client software, you cannot do this. This is not an argument for where the booking logic should be though - see the long thread "Should lock methods be callable by the client" for arguments about where the booking method should be .


  • The better way of handling this is to use logical record locking around your booking code. Then as long as one client owns the lock, it knows that no other thread can lock it.
    Returning to my earlier example of the two threads trying to update the same record:
  • Thread A attempts to lock the record - succeeds
  • Thread B attempts to lock the record - goes to wait state
  • Thread A checks that the record is still available
  • Thread A books the record
  • Thread A unlocks the record
  • Thread B attempts to lock the record - succeeds
  • Thread B checks that the record is still available - cannot book record
  • Thread B unlocks the record



  • Regards, Andrew


    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    Vitali Chalov
    Greenhorn

    Joined: Feb 01, 2004
    Posts: 6
    Hi Andrew,
    Thanks for your thorough reply. Unfortunately, it has not dissolved my doubts. Let me try to re-phrase my question. There is no doubt that record locking must be implemented � it is probably automatic failure otherwise. So, we implement it.
    Now, one of the biggest design choices for me is how many RandomAccessFile (raf) instances to use:
  • 1. single instance (through enclosing singleton, static reference, etc.)
  • 2. one instance per client
  • 3. there is 3rd option � as Min is doing � 1 instance per request but for purposes of this discussion (reduce the scope) I suggest we leave it alone.
  • So, the question is to be or not to be - single or one per client?
    I am saying that single instance is a way easier to implement. However, implementing it makes record-locking mechanism ineffective in terms of providing �concurrent access to your database�.
    I take your scenario and extend it a bit (and also we assume that there is now caching � we work directly with the database file):
  • Thread A attempts to lock the record 1- succeeds
  • Thread B attempts to lock the record 2 � succeeds
  • Thread A acquires object lock on raf (enters synchronized block)
  • Thread A checks that the record is still available
  • Thread B tries to get the object lock on raf and blocks
  • Thread A books the record; B is blocked
  • Thread A releases the object lock on raf (leaves the synchronized block)
  • Thread A unlocks the record
  • Thread B runs and, if lucky, gets the object lock on raf
  • Thread B checks that the record is still available - cannot book record
  • Thread B releases the object lock
  • Thread B unlocks the record

  • Sorry, it is too lengthy but hope you got the point: single instance of RAF does not allow to benefit from implementing record-locking mechanism � database operations block and are still performed in sequential manner � even on multi-processor.
    I saw postings here that people have passed with single raf design. I wonder if this is because there is another requirement in the assignment: �A clear design, such as will be readily understood by junior programmers, will be preferred to a complex one, even if the complex one is a little more efficient�. Does it justify using single RandomAccessFile? Or I simply don't get it at all and there is nothing wrong in using single instance of RandomAccessFile?
    Thank you,
    Vitali.
    [ April 05, 2004: Message edited by: Vitali Chalov ]
    Gytis Jakutonis
    Ranch Hand

    Joined: Feb 02, 2004
    Posts: 76
    Hi Vitali,
    a few notes on your scenario:
    Thread A attempts to lock the record 1- succeeds
    Thread B attempts to lock the record 2 � succeeds
    Thread A acquires object lock on raf (enters synchronized block)
    Thread A checks that the record is still available
    Thread B tries to get the object lock on raf and blocks
    Thread A releases object lock on raf <- A just finished read()
    Thread A or Therad B acquires object lock on raf <- concurency
    You are right that database operations are performed in seqeuntial manner, but business logic implementation(booking in this case) is concurent.
    And for single raf vs multiple rafs - read my post once again. I was refering to OS file handles limit. Suppose that there are thousands of active connections, and each of them get raf instance. At some point OS file handle limit nay be reached and new connections will not get their raf and will be blocked completely. Also consider physical(low level) file I/O - I'm not so sure if multiple I/O can be performed at the same time on the same file(or different files), unless with multi-head hard drive(imo). So database(file in our case) access will always be sequential at some level and concurent requests will be blocked anyway.
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11509
        
      95

    Hi Vitali,
    Normally you would not synchronize the entire booking method - as you yourself have noted, this reduces concurrency.
    All you need do is synchronize the reading or writing from the file - processing continues in non synchronized blocks.
    So:
  • Thread A locks record 1
  • Thread B locks record 2
  • Thread A reads record 1 inside synchronized block
  • Thread B reads record 1 inside synchronized block


  • Thread A converts block of data read into field structure - this is outside synchronized block
  • Thread B converts block of data read into field structure - this is outside synchronized block
  • Thread A checks record availability - this is outside synchronized block
  • Thread B checks record availability - this is outside synchronized block
  • Thread A sets fields necessary for booking record - this is outside synchronized block
  • Thread B sets fields necessary for booking record - this is outside synchronized block


  • Thread A writes record 1 inside synchronized block
  • Thread B writes record 1 inside synchronized block
  • Thread A unlocks record 1
  • Thread B unlocks record 2


  • This way you potentially have a lot of concurrent work happening, while still maintaining integrity.
    Regards, Andrew
    Vitali Chalov
    Greenhorn

    Joined: Feb 01, 2004
    Posts: 6
    Hi guys,
    I want to thank you all for the discussion. I am trying to make up my mind now.
    Andrew, just out of curiosity, what design choice you had in your solution - single RAF or multiple instances?
    And you did expose lock/unlock methods to client, right?
    I realize that the Sun's instructions say that "�, marks are awarded for a clear and consistent approach, rather than for any particular solution." But yet,
    if I may, what scores you got on data class and locking?
    Thank you,
    Vitali.
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11509
        
      95

    Hi Vitali,
    I did the older, "Fly By Night Services" assignment. This meant that I had different "challenges" than those doing the current assignements. It also means that most of your questions did not apply to me .

    Andrew, just out of curiosity, what design choice you had in your solution - single RAF or multiple instances?

    No choice in this matter.

    And you did expose lock/unlock methods to client, right?

    Again, no real choice. My instructions were explicit that the client had to be able to call the locking methods.

    if I may, what scores you got on data class and locking?


    Effectively, I got 97%.
    Regards, Andrew
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Gytis and Vitaly,
    Gytis:
    And for single raf vs multiple rafs - read my post once again. I was refering to OS file handles limit. Suppose that there are thousands of active connections, and each of them get raf instance. At some point OS file handle limit nay be reached and new connections will not get their raf and will be blocked completely.

    That's why I talked above of "RAF connections pooling", meaning that you can set (ideally through properties) a minimum and a maximum of used file handles.
    Gytis:
    Also consider physical(low level) file I/O - I'm not so sure if multiple I/O can be performed at the same time on the same file(or different files), unless with multi-head hard drive(imo). So database(file in our case) access will always be sequential at some level and concurent requests will be blocked anyway.

    You're right that the benefits of concurrent file access will be system dependant.
    But reading you, I decided, just by curiousity, to quickly write a small test to check it on my system.
    Here are its outputs:



    In conclusion and on my system (Windows XP Professional but I have no idea about how many heads my hard drive has ), the concurrent version is more efficient by:
    For 5 threads: 257%
    For 10 threads: 270%
    For 20 threads: 395%
    For 50 threads: 914%
    Here is the test code:

    Vitali:
    "A clear design, such as will be readily understood by junior programmers, will be preferred to a complex one, even if the complex one is a little more efficient."
    Does it justify using single RandomAccessFile? Or I simply don't get it at all and there is nothing wrong in using single instance of RandomAccessFile?

    My tests above show that concurrent RAF access may be much more efficient than "just a little".
    So I think that RAF connection pooling is defendable according to the instructions.
    Now performance is *not* a requirement for this assignment, at least not an explicit one, meaning that synchronizing all file accesses on a single RAF instance is defendable too...
    Now given all aspects you must take into account to make it work, and work efficiently:
    - connections pool management (with no mistake! )
    - allowing, at the record level, multiple reads but exclusive writes (reads block while a record is written),
    is it worthy? I made my choice, but it can be discussed.
    Regards,
    Phil.
    Min Huang
    Ranch Hand

    Joined: Mar 17, 2004
    Posts: 100

    You do realize that when you synchronize only your create method, it means that you do not allow more than one thread to create a new record at a time but other operations are allowable, right?

    Yes... I realize that, but that was the only way I could think of to prevent two simultaneous creates looking through the database file, finding the same deleted record, and then both creating a record at the same spot...

    So, when t1 thread is creating a new record, t2 thread is permitted to perform the search operation. Then it depends on the implementation � the search would need a file length but it can be changed by create method at any moment, so accuracy and careful analysis is needed here.

    Yea, I realized that too. A search operation checks the valid flag first before reading the rest of the record data. If it's not valid, then the search method would skip that record and go on to the next. You can use the record length (obtained from the database schema) to find out how many bytes to skip. As the create operation will not set a record to valid until it has finished writing the record data to the the deleted record, nobody will mess around with the deleted record create is working on. (In my implementation, you cannot own a lock on a deleted record - once you delete, you lose the lock).

    I find it quite overwhelming and time consuming for this project and this is why I have started looking again back to more simple design with just single raf (see my previous post). What do you think?

    In retrospect... I don't know. Maybe. But I already did it, so its too late now =P
    As for the OS running out of resources because of all these file handles, in my implementation, the file is closed after each operation, so its not like each client opens his own raf and keeps it open until the client dies. But I am worried about running out of resources still. I am noting in my design choices document that the server should be modified later by using FileChannels instead of multiple rafs (unfortunately, NIO is out of the question for the assignment, so I did not use it).
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: single raf vs. raf per client