This week's book giveaway is in the OCMJEA forum. We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line! See this thread for details.
Hi, This is a review, for me, and touches on some new ideas (for me). You would be most kind to correct any incorrect notions I may present, or offer competing, better design ideas. ------ Part 1 ------ When I first started out in this group, my initial "brilliant" design plan was as follows:
Where let MyData contain a subset of methods with a similar signature to the interface Sun provides. For simplicity, let's say that there are three methods: read(): reads data from a record. update(): changes the data for a record. newWrite(): creates a new record in the database. In this discussion, the methods lock(), unlock(), and isLocked() will not appear within MyData but when introduced, will be in another class or interface. And, this idea solved what at the time appeared to be everything. But, as Andrew pointed out, this was not sufficiently concurrent. That is, the three methods were not always accessing the random access file, so there was no reason to make the complete method synchronized, as this essentially locks down the complete database and slows things down; that is, it goes against concurrency. So, let's have MyData use MicroData whenever it wants to do a micro operation; MicroData also has the same three methods, read(), update(), newWrite(), but these involve no overhead in that these methods completely spend their time communicating to the random access file. Thus, within MicroData, every method can be synchronized without losing any concurrency. We now have this structure:
Now we come to my next "brilliant" idea. With the introduction of MicroData, we can now with ease lock the entire database, simply by introducing a new synchronized method called lockDb(). Then, this makes it possible to use the newWrite() method safely to create new records in the file; that is, you would first call lockDb() sending it a command to forward a call to newWrite(). Question: Does this sound alright; or is it another cheap trick with a flaw? (For instance, my initial "brilliant" idea spoken about at the top of this article was flawed in that it lessened concurrency). Now, the design isn't quite correct yet because MyData is multi-threaded, and you would experience failure if two threads simultaneously operated on the same record. Thus, in this group I read about "mutex", found the algorithm at Doug's web site, and posted simple articles in this group to learn about elementary locking theory. Once you understand that one mutex is used to control access to a shared resource, you immediately expand this to the realization that each record in the file is a shared resource, so you need a collection of mutex-like objects, where each such mutex-like object relates to one record. This collection of mutex-like objects I will call class NLock.
If NLock is a legacy collection (Hashtable or Vector, for example), then the complete collection is synchronized, and thus concurrency is lost; so, this would be a bad idea. For our initial thoughts, let's say that NLock is an unsynchronized ArrayList with a length equal to that of the physical database random access file's number of records. Within each index of NLock will reside our mutex-like object related to each record in the file. An ArrayList will work fine in a multi-threaded environment as long as we never change the size of the ArrayList. But, as the JavaDoc's say, you cannot change the size of an ArrayList in a multi-threaded environment; thus, we come upon our first major design problem: how to implement newWrite. ------ Part 2 ------ I am not sure what the consensus is regarding newWrite(); do people feel they need to have the functionality built in to create a new record in the file? I don't know. Let's assume, then, that for this discussion, we do want to implement newWrite so that a new record can be added to the RAF. Given that we have decided to find a way to implement newWrite(), this implies that at some point during processing, the size of the ArrayList NLock must be able to get larger. When we want to do a newWrite operation, the following steps need to be taken: 1. Stop all incoming threads to NLock and get them to wait around until told to run. 2. Wait for all threads currently using NLock to complete. 3. Finally, when no thread is using NLock, send one thread in to carry out the newWrite operation which will add a record to the RAF and add an element to the ArrayList NLock. 4. When done, allow all those threads waiting to get access to NLock to proceed. The above four steps probably can be implemented in two ways (or more): 1. As a queue, where the Queue class would sit just before the NLock class. I would have to investigate this further, but the Queue controls thread access to NLock, and knows which threads have not yet returned from NLock. 2. As another form of a locking mechanism; perhaps best investigated by trying to find such a locking algorithm on the internet, or by trying to design it myself, or both. This might mean extending the Thread class to make our own customThread; it might mean using the observer-observable pattern; but, these are just vague ideas at this point. One thing is clear, however, when a thread is "waiting" to get access to NLock, it should not be using up CPU cycles. And, another stumbling block to miss is that we don't want this new class to erroneously lock down the complete database implicitly all the time. Question: I would be curious if there are any other design ideas for locking the database and increasing its size which I have not thought of or yet learned about. Before rushing in to build a new class to handle the problem of adding a new record to the RAF and to the NLock, we might consider if changing the class of NLock from an ArrayList to a Map might help us. That will be considered in the next section. ------ Part 3 ------ This part investigates whether changing NLock from an ArrayList to a Map will help us in implementing the newWrite() method safely. Given that JavaDoc says this: Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map.
it does not solve our problem of growing the map's size. Now, it's true that for certain efficiency reasons we may prefer to implement NLock as a Map instead of an ArrayList, but that is an implementation detail. The point is that as concerns our ability to use a given Java unsynchronized collection which we can grow in size, there appears to be no off-the-shelf class for our use. ------ Part 4 ------ This part brainstorms on a solution to implementing newWrite. Given that we have decided to find a way to implement newWrite(), this implies that at some point during processing, the size of the ArrayList NLock must be able to get larger. When we want to do a newWrite operation, the following steps need to be taken: 1. Stop all incoming threads to NLock and get them to wait around until told to run. 2. Wait for all threads currently using NLock to complete. 3. Finally, when no thread is using NLock, send one thread in to carry out the newWrite operation which will add a record to the RAF and add an element to the ArrayList NLock. 4. When done, allow all those threads waiting to get access to NLock to proceed. One possible,theoretical idea, is to think about the lock as having behavior: 1. The lock sees two classes of individuals attempting to gain the lock for a record: classReadUpdate is a read or update lock request on a record; classNewWrite is a newWrite request on the complete database. 2. The lock associated with each record changes its behavior. The lock's default behavior is to accept a classReadUpdate. But, once a SignificantEvent has occured, it's state changes so that it accepts no new locking requests; furthermore, all locks within NLock are transformed in their behavior so that they accept no new locking requests. 3. If due to multi-threading, two or more locks within NLock have accepted a classNewWrite request, this is an implementation error; that is, newWrite requests may also need to be synchronized off a separate object called NewWriteSyncRequest before that thread is allowed to request a lock from NLock. Having these newWrite requests be synchronized is not seen as a problem, as the system can be designed not to frequently use the newWrite function by, say, always adding at least 100 inactive new records to the RAF and NLock at a time. In fact, if you plan to add 100 new inactive records at a time, there is no need to have multiple newWrite requests at all. Thus, all newWrite requests can be targeted against one unique object (NewWriteSyncRequest) contained within NLock and not associated with any particular record; if this newWrite request is still waiting to get going and another newWrite request comes in, you just ignore the latest request. Or it could be that you allow the newWrite requests to queue up,but if there are already 10 or more inactive records at the end of the file, the newWrite request is ignored. 4. Once one newWrite request has obtained control of the NewWriteSyncRequest, just like an insect, a "chemical" is sent to all the locks within NLock, and these locks transform their behavior so that they no longer accept any new locking requests. At some point, all the threads that currently hold locks on the mutex-like objects within NLock will complete their work; and, mutex-like object within NLock cannot transform its state until any thread that holds its lock has released its lock. 5. At some point, every mutex-like object within NLock will be transformed so that it accepts no more locks from classReadUpdate requests; and, this can only occur if there are no active threads operating on any records. At this time, a new, macro-chemical event is sent to the NewWriteSyncRequest sub-system, and it knows that it is finally safe to do the following: a. Add 100 new inactive records to the RAF. b. Add 100 new locks to the NLock ArrayList. 6. When these new, inactive records have been added, an "enable chemical" is sent to every element within NLock to change it's behavior back to its default state where again each mutex-like object accepts requests from the classReadUpdate, and individual records are locked and unlocked as before. 7. Some aspects of the above design idea may be able to leverage off the fact that when MicroData's lockDb() method is called, the complete database file is locked down. 8. The above ideas suggest that we now have some reason to know if a given record is locked or not; that is, the isLocked() method now might actually have a use in this application. Implementing a boolean nLock.isLocked(recordNumber) method is considered rather trivial, since every mutex-like object within NLock already knows whether or not it is locked or unlocked. You can probably optimize, with respect to time, the ideas given above by having the macro-chemical sent out as soon as you know the thread has entered the synchronized microData.lockDb() method. Notes on chemicals used: 1. The current size of NLock is known to be N. 2. The NewWriteSyncRequest object sends the chemical to each cell within NLock. 3. Each cell, once transformed to accept no incoming lock requests, sends another chemical signature back to NewWriteSyncRequest, and NewWriteSyncRequest sums up these receipts until they equal N, and at that time NewWriteSyncRequest knows the database is locked down, and sends a thread into microData.lockDb() which in turn invokes microData.newWrite(). 4. Once NewWriteSyncRequest is done, it sends out a final chemical to each cell telling each cell to now start accepting record lock requests. The above ideas may be implemented by changing NLock from an ArrayList to a Map. Either collection will do, and may do as equally, but I will find out when I enter a more detailed design level. The point is that whatever collection is used, it must be unsynchronized, and it must currently hold at least the number of records currently in the physical, database random access file. ------ Part 5 ------ Our design now looks like this:
What is not explicitly shown is that NLock (based upon the writing ideas in parts 2 and 4) also has a mechanism for locking down both NLock itself in its entirety and the RAF (either implicitly or explicitly by invoking microData.lockDb() which itself implicitly locks the RAF due to all its methods being synchronized). One thing I've seen mentioned in this group concerns lock ownership with respect to locking and unlocking. That is, it is undesirable, and in fact illogical for one thread to unlock a record which another thread locked. In the above design, this is not really an issue because MyServer is doing all the work, and the lock(), process(), unlock() string of commands are always performed within one thread. But, maybe I should think about this to be certain. Okay, it appears that this may be an issue. Here is an example: Harry requests and obtains a lock on record 1. Tom requests a lock on record 1 but waits. Therefore, no one can interfere with Harry. Harry processes record 1. Harry unlocks record 1. However, there could be a problem if the programmer makes an error thus: Harry requests and obtains a lock on record 1. Tom requests an unlock on record 1 erroneously. Dick requests and obtains a lock on record 1. Now Dick and Harry both believe they can operate simultaneously on record 1, for after all, lock and unlock calls are programatically unenforcable conventions. Dick and Harry have now quite possibly messed up the database record 1, or maybe even the database itself. However, I'm not sure that one can put in every safeguard against illegal programming. For instance, assume Tom, Dick, and Harry are integer ID's. We now have this:
What stops a programmer from (1) leaving out the if test on isLocked()?, and (2) what stops the programmer from erroneously coding isLocked(1)? Nothing. For acquiring statistics during debugging and testing, it might be nice that each mutex-like object within each cell of NLock keep track of how many lock and unlock requests it processed, and how many chemical changes it underwent. At any time, one can try to completely lock NLock, and then check all these flags to make sure that for each mutex-like object, the number of locks and unlocks that particular mutex-like object processed are equal. For debugging, and given that programming errors do occur, it might be nice to have a rudimentary ID system. But, this need not be associated with the client, it can be associated with a code block. Each mutex-like object assigns a new ID when it is locked; this can start on some random integer value and increment by 1 (wrapping to a negative value is fine). But, this does not assure you that any two records necessarily have unique ID's, but it might, perhaps, find an error, but it is not assured. You would not want to create a static ID for NLock, for then concurrency suffers. It might be tempting to use the ClientID as an identifier, but it may be just as desirable not to exclude Mr. Smith from using that client ID from work while Mrs. Smith uses that same client ID from her place of work. Nor could you combine the client ID with the computer network number, since Mr. and Mrs. Smith could be time-sharing the same computer, Mr. Smith making a reservation "simultaneously" (if the system if slow enough) while Mrs. Smith makes her reservation. I have read in this group that some consider sending the MyData object from the client to the server, and that these MyData objects are by definition unique. This certainly sounds like MyData might be unique in this context, and as such would make an excellent ID. Presumably, even if two different computers functioned absolutely identically, if they each sent their MyData object to our server, our server would understand that myData1 != myData2, because myData1 and myData2 are unique objects existing within the Java run-time environment of our server (but I don't know this with certainty). This means that we have a little extra complexity, for when our server accepts the myData1 object, our server becomes, within this short context, the client, while our client becomes our server. But, all in all, assuming myData being sent to us from the client really is unique, it is probably the best, unique identifier available. However, there is no MyData on the client. At least in this particular implementation, because MyData exists only on the server. So, this still isn't a problem, the client can send over any predefined object that will work as its ID, just so every client sends over the same object type. Let's simply say that this ID is of type Object. Let's define Tom, Dick, and Harry as type Object wherein each was sent over from the client and each has a unique address within our server's JVM. And, let's see how the locking and unlocking might work; note that Tom, Dick, and Harry are each operating in separate threads:
Now we don't have to worry about coding errors where the ID is confused with the record number. And, we won't be using Tom, Dick, or Harry as the ID object, but a variable called, let's say, lockId, so the code looks pretty clean for each thread:
Now the process() method may be able to enforce that the lock has been obtained thus:
Or, it may make more sense to defer any check on nLock.isLocked() until the very low read and update methods (the check for isLocked would not be required for newWrite given the design). ------ Part 6 ------ Our design now looks like this:
What is not explicitly shown is that NLock (based upon the writing ideas in parts 2 and 4) also has a mechanism for locking down both NLock itself in its entirety and the RAF (either implicitly or explicitly by invoking microData.lockDb() which itself implicitly locks the RAF due to all its methods being synchronized). Also, what is not explicitly shown is that the lock, unlock, and isLocked methods now have Object ID's associated with them and that these ID's were sent over from the client; these ID's are stored in each mutex-like object cell within NLock. It is interesting to consider the major, competing macro design, and this is when the client directly accesses the methods of MyData.
Even though I wrote in the above figure that MyData uses NLock, NLock is really the guy in charge; that is, NLock controls the access to MyData's read and update methods. How you might integrate NLock more closely with MyData is not being discussed as it is considered an implementation detail. Note that the above design probably doesn't need to be re-worked too much with respect to newWrite, because NLock controls MyData, and when NLock is totally locked down, MyData becomes totally locked down too. It is interesting that because we defined the MicroData class, we are not forced to store the file into memory like this:
Presumably if the file is stored in memory, then the reading and writing of the file would only occur when the server starts up or shuts down; or perhaps added logic is necessary to periodically save the database in memory to disk. I'm not sure I see any benefit from storing the file in memory, and simultaneously updating the FileInMemory as well as simultaneously updating the physical, database random access file. But, this is not a design which I have studied before. More than likely, I have seen references in this group where some developers have made the FileInMemory class a sophisticated cache used to enhance performance (for instance, on a read, you don't always have to read from disk, but can read from the FileInMemory class instead). For simplicity, we will consider the competing design in its simplest form:
Note that the now missing MyServer class did simple transforms between English-like commands to specific calls to lock, process, and unlock. It may be desirable to move some of these more English-like, server-like methods over to the client side thus:
and in this way, the "client" can talk to its local "server" using English commands like this: myLocalServer.readAllRecords(). There is one, perhaps significant, added complexity this competing design adds, and that is how to deal with client requests which die. I speculate that this would occur in this scenario: 1. the client calls remote myData.lock(lockId, recordNumber = 1); 2. the client machine is unplugged! 3. now record 1 on the server is locked forever. Based upon my light readings in this group on this topic, the following terms and the following transformations would probably be made: 1. NLock would be transformed into a particular Map such that when the remote ObjectId from the client "died", the lock would be released. 2. "weak reference" term is used. As you can see, I don't know much about this particular aspect of this competing design, but I am writing about it to discover its differences and similarities, and to see "what I'm missing" by not implementing this particular design. I may, at some future point, simply to get "full exercise" of my skills, transform my design to this design, since it adds this extra complexity which may be important to learn about (also, it's possible that Sun has changed its exam so that this particular design is now required, thus suggesting that Sun considers it important for us to investigate and implement, though, for me, it is not required). If I'm not mistaken, there are numerous threads in this group relating to this particular aspect, but I have not yet absorbed them. When I learn more about this topic, I will probably edit this posting. ------ Part 7 ------ Here are a few of the designs listed from top to bottom as approximately easiest to hardest to implement:
where FIM means "FileInMemory" and where it could be implemented as holding only parts of the file or all the file. There may be other variants of the above, but I think that there are basicaly two main variations, and for each variation, you may or may not have a FIM. Thanks, Javini Javono