aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Design 01 Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Design 01" Watch "Design 01" New topic
Author

Design 01

Javini Javono
Ranch Hand

Joined: Dec 03, 2003
Posts: 286
Hi,
This is a review, for me, and touches on some new ideas (for me). You would be
most kind to correct any incorrect notions I may present, or offer competing,
better design ideas.
------
Part 1
------
When I first started out in this group, my initial "brilliant" design plan was
as follows:

Where let MyData contain a subset of methods with a similar signature to the
interface Sun provides. For simplicity, let's say that there are three
methods:
read(): reads data from a record.
update(): changes the data for a record.
newWrite(): creates a new record in the database.
In this discussion, the methods lock(), unlock(), and isLocked() will not appear
within MyData but when introduced, will be in another class or interface.
And, this idea solved what at the time appeared to be everything. But, as Andrew
pointed out, this was not sufficiently concurrent. That is, the three methods
were not always accessing the random access file, so there was no reason
to make the complete method synchronized, as this essentially locks down the
complete database and slows things down; that is, it goes against concurrency.
So, let's have MyData use MicroData whenever it wants to do a micro operation;
MicroData also has the same three methods, read(), update(), newWrite(), but
these involve no overhead in that these methods completely spend their time
communicating to the random access file. Thus, within MicroData, every method
can be synchronized without losing any concurrency.
We now have this structure:

Now we come to my next "brilliant" idea. With the introduction of MicroData,
we can now with ease lock the entire database, simply by introducing a new
synchronized method called lockDb(). Then, this makes it possible to use the
newWrite() method safely to create new records in the file; that is, you
would first call lockDb() sending it a command to forward a call to newWrite().
Question:
Does this sound alright; or is it another cheap trick with a flaw? (For
instance, my initial "brilliant" idea spoken about at the top of this article
was flawed in that it lessened concurrency).
Now, the design isn't quite correct yet because MyData is multi-threaded, and
you would experience failure if two threads simultaneously operated on the same
record. Thus, in this group I read about "mutex", found the algorithm at
Doug's web site, and posted simple articles in this group to learn about
elementary locking theory.
Once you understand that one mutex is used to control access to a shared
resource, you immediately expand this to the realization that each record in
the file is a shared resource, so you need a collection of mutex-like objects,
where each such mutex-like object relates to one record. This collection of
mutex-like objects I will call class NLock.

If NLock is a legacy collection (Hashtable or Vector, for example), then the
complete collection is synchronized, and thus concurrency is lost; so, this
would be a bad idea.
For our initial thoughts, let's say that NLock is an unsynchronized ArrayList
with a length equal to that of the physical database random access file's number
of records.
Within each index of NLock will reside our mutex-like object related to each
record in the file. An ArrayList will work fine in a multi-threaded environment
as long as we never change the size of the ArrayList. But, as the JavaDoc's say,
you cannot change the size of an ArrayList in a multi-threaded environment;
thus, we come upon our first major design problem: how to implement newWrite.
------
Part 2
------
I am not sure what the consensus is regarding newWrite(); do people feel they
need to have the functionality built in to create a new record in the file?
I don't know. Let's assume, then, that for this discussion, we do want to
implement newWrite so that a new record can be added to the RAF.
Given that we have decided to find a way to implement newWrite(), this implies
that at some point during processing, the size of the ArrayList NLock must be
able to get larger. When we want to do a newWrite operation, the following
steps need to be taken:
1. Stop all incoming threads to NLock and get them to wait around until told
to run.
2. Wait for all threads currently using NLock to complete.
3. Finally, when no thread is using NLock, send one thread in to carry out the
newWrite operation which will add a record to the RAF and add an element to the
ArrayList NLock.
4. When done, allow all those threads waiting to get access to NLock to proceed.
The above four steps probably can be implemented in two ways (or more):
1. As a queue, where the Queue class would sit just before the NLock class.
I would have to investigate this further, but the Queue controls thread access
to NLock, and knows which threads have not yet returned from NLock.
2. As another form of a locking mechanism; perhaps best investigated by trying
to find such a locking algorithm on the internet, or by trying to design it
myself, or both. This might mean extending the Thread class to make our own
customThread; it might mean using the observer-observable pattern; but, these
are just vague ideas at this point.
One thing is clear, however, when a thread is "waiting" to get access to NLock,
it should not be using up CPU cycles. And, another stumbling block to miss is
that we don't want this new class to erroneously lock down the complete database
implicitly all the time.
Question:
I would be curious if there are any other design ideas for locking the database
and increasing its size which I have not thought of or yet learned about.
Before rushing in to build a new class to handle the problem of adding a new
record to the RAF and to the NLock, we might consider if changing the class of
NLock from an ArrayList to a Map might help us. That will be considered in the
next section.
------
Part 3
------
This part investigates whether changing NLock from an ArrayList to a Map will
help us in implementing the newWrite() method safely.
Given that JavaDoc says this:

Note that this implementation is not synchronized. If multiple threads access
this map concurrently, and at least one of the threads modifies the map
structurally, it must be synchronized externally. (A structural modification is
any operation that adds or deletes one or more mappings; merely changing the
value associated with a key that an instance already contains is not a
structural modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map. If no such object exists, the
map should be "wrapped" using the Collections.synchronizedMap method. This is
best done at creation time, to prevent accidental unsynchronized access to the
map.

it does not solve our problem of growing the map's size. Now, it's true that
for certain efficiency reasons we may prefer to implement NLock as a Map instead
of an ArrayList, but that is an implementation detail. The point is that as
concerns our ability to use a given Java unsynchronized collection which we can
grow in size, there appears to be no off-the-shelf class for our use.
------
Part 4
------
This part brainstorms on a solution to implementing newWrite.
Given that we have decided to find a way to implement newWrite(), this implies
that at some point during processing, the size of the ArrayList NLock must be
able to get larger. When we want to do a newWrite operation, the following
steps need to be taken:
1. Stop all incoming threads to NLock and get them to wait around until told
to run.
2. Wait for all threads currently using NLock to complete.
3. Finally, when no thread is using NLock, send one thread in to carry out the
newWrite operation which will add a record to the RAF and add an element to the
ArrayList NLock.
4. When done, allow all those threads waiting to get access to NLock to proceed.
One possible,theoretical idea, is to think about the lock as having behavior:
1. The lock sees two classes of individuals attempting to gain the lock for a
record: classReadUpdate is a read or update lock request on a record;
classNewWrite is a newWrite request on the complete database.
2. The lock associated with each record changes its behavior. The lock's
default behavior is to accept a classReadUpdate. But,
once a SignificantEvent has occured, it's state changes so that it accepts no
new locking requests; furthermore, all locks within NLock are transformed in
their behavior so that they accept no new locking requests.
3. If due to multi-threading, two or more locks within NLock have accepted a
classNewWrite request, this is an implementation error; that is, newWrite
requests may also need to be synchronized off a separate object called
NewWriteSyncRequest before that thread is allowed to request a lock from NLock.
Having these newWrite requests be synchronized is not seen as a problem, as the
system can be designed not to frequently use the newWrite function by, say,
always adding at least 100 inactive new records to the RAF and NLock at a time.
In fact, if you plan to add 100 new inactive records at a time, there is no need
to have multiple newWrite requests at all. Thus, all newWrite requests can be
targeted against one unique object (NewWriteSyncRequest) contained within NLock
and not associated with any particular record; if this newWrite request is
still waiting to get going and another newWrite request comes in, you just
ignore the latest request. Or it could be that you allow the newWrite requests
to queue up,but if there are already 10 or more inactive records at the end of
the file, the newWrite request is ignored.
4. Once one newWrite request has obtained control of the NewWriteSyncRequest,
just like an insect, a "chemical" is sent to all the locks within NLock, and
these locks transform their behavior so that they no longer accept any new
locking requests. At some point, all the threads that currently hold locks
on the mutex-like objects within NLock will complete their work; and, mutex-like
object within NLock cannot transform its state until any thread that holds its
lock has released its lock.
5. At some point, every mutex-like object within NLock will be transformed so
that it accepts no more locks from classReadUpdate requests; and, this can
only occur if there are no active threads operating on any records. At this
time, a new, macro-chemical event is sent to the NewWriteSyncRequest sub-system,
and it knows that it is finally safe to do the following:
a. Add 100 new inactive records to the RAF.
b. Add 100 new locks to the NLock ArrayList.
6. When these new, inactive records have been added, an "enable chemical" is
sent to every element within NLock to change it's behavior back to its default
state where again each mutex-like object accepts requests from the classReadUpdate,
and individual records are locked and unlocked as before.
7. Some aspects of the above design idea may be able to leverage off the fact
that when MicroData's lockDb() method is called, the complete database file is
locked down.
8. The above ideas suggest that we now have some reason to know if a given
record is locked or not; that is, the isLocked() method now might actually
have a use in this application.
Implementing a boolean nLock.isLocked(recordNumber) method is considered rather
trivial, since every mutex-like object within NLock already knows whether or
not it is locked or unlocked.
You can probably optimize, with respect to time, the ideas given above by having
the macro-chemical sent out as soon as you know the thread has entered the
synchronized microData.lockDb() method.
Notes on chemicals used:
1. The current size of NLock is known to be N.
2. The NewWriteSyncRequest object sends the chemical to each cell within NLock.
3. Each cell, once transformed to accept no incoming lock requests, sends
another chemical signature back to NewWriteSyncRequest, and NewWriteSyncRequest
sums up these receipts until they equal N, and at that time NewWriteSyncRequest
knows the database is locked down, and sends a thread into microData.lockDb()
which in turn invokes microData.newWrite().
4. Once NewWriteSyncRequest is done, it sends out a final chemical to each
cell telling each cell to now start accepting record lock requests.
The above ideas may be implemented by changing NLock from an ArrayList to a Map.
Either collection will do, and may do as equally, but I will find out when
I enter a more detailed design level. The point is that whatever collection is
used, it must be unsynchronized, and it must currently hold at least the number
of records currently in the physical, database random access file.
------
Part 5
------
Our design now looks like this:

What is not explicitly shown is that NLock (based upon the writing ideas
in parts 2 and 4) also has a mechanism for locking down both NLock itself in its
entirety and the RAF (either implicitly or explicitly by invoking
microData.lockDb() which itself implicitly locks the RAF due to all its methods
being synchronized).
One thing I've seen mentioned in this group concerns lock ownership with
respect to locking and unlocking. That is, it is undesirable, and in fact
illogical for one thread to unlock a record which another thread locked. In
the above design, this is not really an issue because MyServer is doing all
the work, and the lock(), process(), unlock() string of commands are always
performed within one thread. But, maybe I should think about this to be certain.
Okay, it appears that this may be an issue. Here is an example:
Harry requests and obtains a lock on record 1.
Tom requests a lock on record 1 but waits.
Therefore, no one can interfere with Harry.
Harry processes record 1.
Harry unlocks record 1.
However, there could be a problem if the programmer makes an error thus:
Harry requests and obtains a lock on record 1.
Tom requests an unlock on record 1 erroneously.
Dick requests and obtains a lock on record 1.
Now Dick and Harry both believe they can operate simultaneously on record 1,
for after all, lock and unlock calls are programatically unenforcable
conventions.
Dick and Harry have now quite possibly messed up the database record 1, or maybe
even the database itself.
However, I'm not sure that one can put in every safeguard against illegal
programming. For instance, assume Tom, Dick, and Harry are integer ID's.
We now have this:

What stops a programmer from (1) leaving out the if test on isLocked()?, and
(2) what stops the programmer from erroneously coding isLocked(1)? Nothing.
For acquiring statistics during debugging and testing, it might be nice that
each mutex-like object within each cell of NLock keep track of how many lock
and unlock requests it processed, and how many chemical changes it underwent.
At any time, one can try to completely lock NLock, and then check all these
flags to make sure that for each mutex-like object, the number of locks and
unlocks that particular mutex-like object processed are equal.
For debugging, and given that programming errors do occur, it might be nice
to have a rudimentary ID system. But, this need not be associated with the
client, it can be associated with a code block. Each mutex-like object
assigns a new ID when it is locked; this can start on some random integer
value and increment by 1 (wrapping to a negative value is fine). But, this
does not assure you that any two records necessarily have unique ID's, but
it might, perhaps, find an error, but it is not assured. You would not want
to create a static ID for NLock, for then concurrency suffers.
It might be tempting to use the ClientID as an identifier, but it may be just
as desirable not to exclude Mr. Smith from using that client ID from work while
Mrs. Smith uses that same client ID from her place of work. Nor could you
combine the client ID with the computer network number, since Mr. and Mrs. Smith
could be time-sharing the same computer, Mr. Smith making a reservation
"simultaneously" (if the system if slow enough) while Mrs. Smith makes her
reservation.
I have read in this group that some consider sending the MyData object from
the client to the server, and that these MyData objects are by definition
unique. This certainly sounds like MyData might be unique in this context,
and as such would make an excellent ID. Presumably, even if two different
computers functioned absolutely identically, if they each sent their MyData
object to our server, our server would understand that myData1 != myData2,
because myData1 and myData2 are unique objects existing within the Java
run-time environment of our server (but I don't know this with certainty).
This means that we have a little extra complexity, for when our server accepts
the myData1 object, our server becomes, within this short context, the client,
while our client becomes our server. But, all in all, assuming myData being
sent to us from the client really is unique, it is probably the best, unique
identifier available.
However, there is no MyData on the client. At least in this particular
implementation, because MyData exists only on the server. So, this still
isn't a problem, the client can send over any predefined object that will
work as its ID, just so every client sends over the same object type.
Let's simply say that this ID is of type Object. Let's define Tom, Dick, and
Harry as type Object wherein each was sent over from the client and each has
a unique address within our server's JVM. And, let's see how the locking
and unlocking might work; note that Tom, Dick, and Harry are each operating in
separate threads:

Now we don't have to worry about coding errors where the ID is confused with
the record number. And, we won't be using Tom, Dick, or Harry as the ID
object, but a variable called, let's say, lockId, so the code looks pretty
clean for each thread:

Now the process() method may be able to enforce that the lock has been obtained
thus:

Or, it may make more sense to defer any check on nLock.isLocked() until the
very low read and update methods (the check for isLocked would not be required
for newWrite given the design).
------
Part 6
------
Our design now looks like this:

What is not explicitly shown is that NLock (based upon the writing ideas
in parts 2 and 4) also has a mechanism for locking down both NLock itself in its
entirety and the RAF (either implicitly or explicitly by invoking
microData.lockDb() which itself implicitly locks the RAF due to all its methods
being synchronized).
Also, what is not explicitly shown is that the lock, unlock, and isLocked
methods now have Object ID's associated with them and that these ID's were
sent over from the client; these ID's are stored in each mutex-like object
cell within NLock.
It is interesting to consider the major, competing macro design, and this is
when the client directly accesses the methods of MyData.

Even though I wrote in the above figure that MyData uses NLock, NLock is
really the guy in charge; that is, NLock controls the access to MyData's
read and update methods. How you might integrate NLock more closely with
MyData is not being discussed as it is considered an implementation detail.
Note that the above design probably doesn't need to be re-worked too much with
respect to newWrite, because NLock controls MyData, and when NLock is totally
locked down, MyData becomes totally locked down too.
It is interesting that because we defined the MicroData class, we are not forced
to store the file into memory like this:

Presumably if the file is stored in memory, then the reading and writing of the
file would only occur when the server starts up or shuts down; or perhaps added
logic is necessary to periodically save the database in memory to disk.
I'm not sure I see any benefit from storing the file in memory, and simultaneously
updating the FileInMemory as well as simultaneously updating the physical,
database random access file. But, this is not a design which I have studied
before. More than likely, I have seen references in this group where some
developers have made the FileInMemory class a sophisticated cache used to
enhance performance (for instance, on a read, you don't always have to read
from disk, but can read from the FileInMemory class instead).
For simplicity, we will consider the competing design in its simplest form:

Note that the now missing MyServer class did simple transforms between
English-like commands to specific calls to lock, process, and unlock. It may
be desirable to move some of these more English-like, server-like methods
over to the client side thus:

and in this way, the "client" can talk to its local "server" using English
commands like this: myLocalServer.readAllRecords().
There is one, perhaps significant, added complexity this competing design adds,
and that is how to deal with client requests which die.
I speculate that this would occur in this scenario:
1. the client calls remote myData.lock(lockId, recordNumber = 1);
2. the client machine is unplugged!
3. now record 1 on the server is locked forever.
Based upon my light readings in this group on this topic, the following terms
and the following transformations would probably be made:
1. NLock would be transformed into a particular Map such that when the remote
ObjectId from the client "died", the lock would be released.
2. "weak reference" term is used.
As you can see, I don't know much about this particular aspect of this
competing design, but I am writing about it to discover its differences and
similarities, and to see "what I'm missing" by not implementing this particular
design. I may, at some future point, simply to get "full exercise" of my
skills, transform my design to this design, since it adds this extra complexity
which may be important to learn about (also, it's possible that Sun has changed
its exam so that this particular design is now required, thus suggesting that
Sun considers it important for us to investigate and implement, though, for me,
it is not required). If I'm not mistaken, there are numerous threads in this
group relating to this particular aspect, but I have not yet absorbed them.
When I learn more about this topic, I will probably edit this posting.
------
Part 7
------
Here are a few of the designs listed from top to bottom as approximately easiest
to hardest to implement:

where FIM means "FileInMemory" and where it could be implemented as holding only
parts of the file or all the file. There may be other variants of the above, but
I think that there are basicaly two main variations, and for each variation,
you may or may not have a FIM.
Thanks,
Javini Javono
 
Consider Paul's rocket mass heater.
 
subject: Design 01