JavaRanch » Java Forums »
Professional Certification »
Developer Certification (SCJD/OCMJD)
| Author |
Threads 002
|
Javini Javono
Ranch Hand
Joined: Dec 03, 2003
Posts: 286
|
|
Hi, Could you please verify this assertion: A multiple-threaded object instance, ObjectName, has a static method which does non-trivial transformations on other class (static) variables (including types long, float, and double): therefore, assuming the Java keyword "volatile" is not used, either 1. The static, multi-threaded method must be synchronized, or 2. At critical points within the static multi-threaded method, it must synchronize a code block on ObjectName.class, or 3. The invocation of the static method must be synchronized on ObjectName.class like this: Since writing the above, I've verified it to be true through text books. The one thing I did not know, however, was that the volatile keyword can and should also be applied to all primitive types: boolean, short, int, as well as the longer types: float, long, and double. What I have not done yet is a test. Does a simple synchronization of a method "cost" in speed from 5 to 10 percent? Probably. Then, how much of a cost in speed is the use of volatile, if anything significant? I might eventually to a simple speed test on this. Thanks, Javini Javono Hi (again), This posting discusses every aspect of thread safety that I can think at this time that relates to this project. I'm reviewing this topic as I am about to go into the implementation and re-implementation stages for my RMI-based server-side software. I'll be using this post as a condensed resource, assuming that the information is accurate, since I find that sometimes these threading issues slip my mind. Of course, I appreciate any comments, corrections, and expansions. Assumptions: Some assumptions change in different sections of this posting. The following assumptions, are constant throughout: 1. There is one database random access file. 2. Only RMI connections are used in these examples. 3. The client has access to a remote Data object which, for simplicity, is simply called Data (instead of DataImpl). Data does not directly manipulate the file, but instead calls MicroData which carries out very low-level file manipulations. Every method of MicroData is synchronized. 4. There is a separate lock manager object called Guard which has three methods: lock(recordNumber), unlock(recordNumber), and isLocked(recordNumber). 5. The Client defines business methods, such as the following: guard.lock(100); fileStuff = data.read(100); fileStuff = someProcess(fileStuff); data.write(fileStuff, 100); guard.unlock(100); The invocation and usage structure looks something like this: Both Data and Guard are remote objects, and the client can invoke methods on both objects. Part 1: General Considerations ------------------------------- Let's first discuss what it means to be "threadsafe." I suspect that this has different interpretations in different environments. This stuff I always find tricky, so don't be surprised if I make an erroneous statement here and there. For this theoretical discussion, we will be writing a Data class which uses the following class, Guard: For this example, there will be one instance of Data and it will be multi-threaded. The following simple rules can be applied to make Data threadsafe: 1. Do not declare any static class variables. 2. Do not declare any instance variables. 3. Only use method variables. These are simple guidelines, and there are more advanced rules not mentioned above. Following the above simple rules, we generate the following threadsafe code for Data: Data is a threadsafe class due to the following: 1. Each thread gets its own unique inputValue primitive. 2. Each thread gets its own copy of value from the stack. 3. Any potentially shared variables that are class or instance variables are final and unchangeable, or are immutable and unchangeable. Here I am assuming that a well designed and coded immutable object is threadsafe, and I am assuming that the String class is threadsafe, even though its JavaDoc does not touch upon this topic. 4. The return value in the declaration, the "long", belongs uniquely to each thread. 5. The code does not attempt to share one instance of object Guard, but instead uses a new instance, and this new instance of Guard is used exclusively by one thread. 6. The invocation of the static method Guard.incrementInput() would not be thread safe if this class method were not declared synchronized. For there only exists one such static method, and, obviously, if process() is multi-threaded, then multiple threads could simultaneously be passing through this method. 7. The invocation of the static method Guard.incrementUnsyncInput() is unsynchronized, and thus not threadsafe. So, within the process() method, the Guard class is synchronized on and then the invocation of the unthreadsafe method occurs. 8. Because Data is multithreaded, then its instance member, guardInstance, is shared by all the threads using Data. To use this shared, instance object in a multithreaded environment, it must be synchronized, and that is what occurs in the above code. 9. mutableObject is passed in as an input argument to the method; it is, of course, a reference to one object. If the invoking method was calling our method like this: then we would be assured that each thread passing through the process() method would have its own, unique mutableObject and operating on it would be threadsafe. However, for this example, we are assuming that the invoking method looks more like this: Thus, mutableObject has only one instance, and so in the method process(), which receives mutablObject as a reference, that one instance of mutableObject must be synchronized before mutating it. Now of course, if we pretend that this mutable object was a Vector which is coded to be threadsafe, then we would have no need to synchronize it since it has already been designed to safely handle multiple threads (this is usually done by simply synchronizing all the methods of the class). The code is threadsafe, in short, because it is coded so as not to use anything which might be shared between two or more threads, but when it does go beyond these elementary rules and does use shared objects, it ensures that their use is safe through the use of synchronization. If you reasonably refactor your design so that your multi-threaded classes follow the simple rules given above, then you minimize or have no need whatsoever for the synchronized keyword in your code; this makes the code easier to understand, and lessons the probability that you will accidentally dead-lock yourself, or that someone coming along after you to maintain the code will make a seemingly harmless change, here and there, only to find that the code dead-locks. Part 2: Combinations --------------------- We will assume always that on the server side there is only one instance of Guard and that it is multi-threaded. We will consider the following assumptions about Data in turn: Case 1 ------ Client(1) --> single-threaded Data(1) Designing and coding Data is the simplest because it is not multi-threaded. Even when you have two clients, each with its own Data object, Data is still single-threaded and still easier to design and code: Client#1 --> single-threaded Data instance number 1 Client#2 --> single-threaded Data instance number 2 Case 2 ------ Client(1) --> multi-threaded Data(1) This is unusual, but would occur if you allowed the clients to create multile threads and send them through Data. An example might be a shopping cart containing 10 hotel reservations; the client then attempts to book 10 reservations simultaneously by sending in 10 threads to use Data at once. Designing and coding data is harder since it must be threadsafe. Case 3 ------ Client(N) --> multi-threaded Data(1) Designing and coding data is harder since it must be threadsafe. Case 4 ------ Client(N) --> multi-threaded Data(M) This is perhaps clearer if we restate it like this: we see that it is not that remarkable in nature. For instance, what we are doing on the server side in our factory is this: we are in advance, saying that we will define not more than 3 Data instances; and as we accept new clients, we share these three instances among the N clients; thus, each instance of data is potentially multi-threaded. We allow there to be as many Data instances as might be required depending on the load of the system and how much memory the server has. Case 4 is simply not required for the exam of course. But, it brings up an interesting question: is RMI a toy? Or is it a real, potential server-based process? One way to achieve scale would be to have a factory decide how many Data instances are needed and how many times each instance should be multi-threaded dynamically, just like a servlet container functions. Part 3: Forces Shared Resource: Guard --------------------------------------- Regardless of how you design your server-side system, that is, whether or not Data is single-threaded or multi-threaded, you are forced, by the requirements of the project, to share the lock manager, which I will call class Guard. Now, you could synchronize every method of Data, and then it could be multi-threaded and also be threadsafe as long as only one instance of Data ever existed on the server; but, this loses concurrency, and is not considered acceptable. Therefore, you are forced to implement a solution using the Java synchronized keyword, using wait(), notify(), notifyAll() (in some subset or combination), and to design a locking mechanism of one type or another. Part 4: Business Methods ------------------------- Let it be assumed that Client contains business methods. If the business method is a compound operation relying on the fact that a previous read has not changed before an update or write is made to the same record, then it is obvious that the following construct within the business method is required: However, what about a business method which simply reads in a given record? There are two choices: we read in the record and don't use the locking and unlocking mechanisms, or we use the locking mechanism before reading the record. For this example, let's assume another business method which deletes a record from the database: Now, at the same instant: Some business process decides that record 100 should be deleted. Client 1 decides to read record 100 to see what is in it. Let's discover the ramifications concerning Client 1 when the Guard is used and when the Guard is not used. Let's begin assuming that the Guard is used. Then the reading business method would look something like this: Now, what exactly have we gained by using the Guard for this read? The record 100 either exists or it doesn't exist. Whether record 100 exists is like atomic theory (sort of): it's uncertain. Even if our business read method uses Guard, record 100's existance cannot be determined. Using Guard doesn't guarantee the "correct" answer, for if the business read method gets the lock first, then the correct answer is that record 100 exists. But if the delete business method gets the lock first, then the equally correct answer is that record 100 no longer exists. Locking the record 100 does not assist us in any way. Therefore, I think we can safely say the following compound hypothesis: 1. If any record in the database is never left by any sub-step of a business process in an inconsistent state, then 2. It is perfectly acceptable for the business read method not to attempt to lock the record it is reading. Now, the above is my opinion, I should add: that is, the qualification at point 1 is something I personally consider important (and I believe that other people might consider it not important). Now the question becomes, do there exist any business methods which in any way mutate a record such that during any sub-steps that record is in an inconsistent state? Basically, this becomes a contract which must be enforced through the code. Thus, every method in Data must carry out enough steps such that when this method is exited, the record is never in an inconsistent state. For example, it is conceivable that a method in Data might, under some hypoethetical circumstance, need to be synchronized to ensure that sub-steps within the method never leave the record in an inconsistent state. If this can be enforced, we are free to not have our reads use the Guard object. Part 5: Reads Free of Guard Use and Implications on Guard's Dynamic Structure ------------------------------------------------------------------------------ Once we decide that business reads of records do not require Guard's lock() and unlock(), then this means that guard must at the minimum have one mutex for any record which is being mutated in some way by a business operation. Here are some concerns about the Guard class: 1. How much memory is available for its use; how large will the Guard object become. 2. How much synchronization is required for the Guard class to function. The more synchronization, then the more contention, that is, the more time processing power is spent locking and unlocking monitors for numerous contending threads. However, once we decide that business reads will not use the Guard class, then that means that far fewer threads and far fewer records will be using the Guard class. This suggests that we do not need to be overly concerned about contention. However, if there is a busy day at the office, and many bookings are being made, and since each booking is a record mutation operation, then each record booked is, at some point, used within the Guard class, which can, under adverse circumstances, eventually consume too much memory. [Aside: I should remind you, that for this project, these are not real issues. But, the issues are interesting, so that is why I am studying them.] In short, the problem is this: once we book a record, then it is highly improbable (but not necessarily impossible), that the record will be mutated again. Thus, how do we remove this record from the Guard? The problem exists because when you use an unsynchronized Guard design, and that design is being multi-threaded, you can't safely grow or shrink the collection, whether this collection be an ArrayList or a HashMap or a WeakHashMap. If you look at some of the algorithms, you may find, though I have not carried out this study myself, that using a synchronized collection does not introduce any more or any more significant contention than an unsyncrhonized collection. In this sceneario, where a synchronized collection is used, it would be safe to shrink the Guard's size either instantly or on a periodic basis perhaps performed by a background thread. Another approach is to use Phil's algorithm which consists of the following constructs for the Guard object (and, I summarize, so please see Phil's article): 1. The Guard object uses a HashMap. 2. Each item in the hash table is a MUTEX. 3. The particular form of the MUTEX is what I call a "linked-list mutex" wherein the first thread in, is the first thread to gain access. 4. And, Phil may not reallly be using a MUTEX (mutually exclusive lock); while I don't know, it's possible he uses a type of lock which might be termed "mutually excluded writes" and "unexcluded reads". Phil's algorithm--so Phil asserts and I believe, though I personally have not worked it out myself--allows the unsynchronized HashMap to dynamically grow and shrink in size. [Aside: again, an interesting question is to study whether a synchronized HashMap would add any significant contention issues.] Notice, by the way, that if we sent every business request through the Guard, including simple business reads, the Guard, regardless of how it would be implemented, would always hold mutexes for each record in the file, since we can assume that given enough users, the complete file is always being read for searches. So, this whole discussion has but little relevance unless one intends to use business reads not requiring the use of the Guard class. Part 6: Multi-Threaded Data and Guard -------------------------------------- For every database file, and we only have one for our project, there must exist only one, unique Guard. So, if each Data is single-threaded, and there certainly will be more than one Data instance, Guard can be an instantiated object with instance methods; each Data instance would have a reference to the same Guard object. Of course, Guard can always be a non-instantiated class having only static methods. If Data is multi-threaded, and there is only one Data instantiated, Guard can be an instantiated object with instance methods; each Data instance would have a reference to the same Guard object. Of course, Guard can always be a non-instantiated class having only static methods. If Data is multi-threaded, and there can be more than one Data instantiated, Guard can be an instantiated object with instance methods; each Data instance would have a reference to the same Guard object. Of course, Guard can always be a non-instantiated class having only static methods. The point is, then, that Guard is meant to be a loner, to be associated with only one database file, and that Guard is meant to be multi-threaded. Even though Guard is a shared object, by Guard's very nature of being a thread resource allocator concerning records, usually no special handling of Guard itself is required. That is, the following line of code works equally as well in a single-threaded Data as a multi-threaded Data: Of course, we have not yet said that Data would contain business methods within it, but the above code represents a multi-threaded business method somewhere (either on the server or on the client, depending on how you set up your design); for me to say that the above example is in Data is not how my design will be. Nevertheless, given some multi-threaded object somewhere that contains business methods, simply calling guard.lock(500) is not a multi-threading issue because it is thread safe. However, within a multi-threaded business method, the following complete business method may not be thread-safe: or perhaps the assignment of the record number itself is sufficient to make the multi-threaded business method unthread-safe: Thus, within a multi-threaded business method, Guard and its methods are threadsafe, but the surrounding code may be full of traps for the unwary. [Which is why I'm reviewing this important material.] If treading in a multi-threaded environment makes you feel uncertain, then it is recommended, obviously, that you use a factor to deal out exactly one instance of Data to each client. No one person, for one second will question that single-threaded code is easier to understand, modify, and verify than multi-threaded code. I may very well go this route myself. An interesting question Phil has contended with is this: what if I design my system so that it can work equally as well on more than one database file. Based upon a posting Phil made, I speculate that Phil's design only has one Guard instance, even when there are multiple database files. There probably is a very good reason Phil settled on this design (since he is very sharp). So, let's investigate this and see why the simpler design, where one instance of Guard was not created for each, different database. We have already determined that Guard need not be a non-instantiated class with only static methods. Thus, we can certainly instantiate one Guard object for each, different database. By the way, it may make more sense to start referring to these databases as database tables. If we do this, then, of course, the Guard object does not need to have coding logic to account for different database tables. Let's give it a walk-through. The multi-threaded or single-threaded business method looks like this: In conclusion, it is unclear why any coding logic for two database tables would exist within the Guard object. Unless that logic was relatively small, and dealt with the organization of using the correct Guard with the correct, underlying database. Again, our projects need not deal with two databases or two database tables. Thanks, Javini Javono [ February 12, 2004: Message edited by: Javini Javono ]
|
 |
 |
|
|
subject: Threads 002
|
|
|
|