Originally posted by Barry Gaunt:
In the NX (Constructors) assignment we have to implement a couple of methods
Once we have locked a record and obtained our lock cookie we may proceed to update a record or mark the record as deleted.
The lock cookie authorizes the client to be able to perform the update or delete operations; any other client supplying a different cookie will be rejected.
My question is: can the update and deletion operations be now left unsynchronized? I want to assume that the single authorized client will not perform any multithreaded update, delete, or unlock operations.
Any thoughts, anybody?
Thanks in advance
[Max]: I'm a big advocate of the rule of synchronization, which states: Don't synchronize unless you have to.
I would emphasize the flip side of this: if you're at all uncertain whether synchronization is really necessary in a given situation, it's usually safer to synchronize.
Originally posted by Jim Yingst:
[Max]: I have to disagree here
Them's fightin' words! Howzabout you'n me step outside and... err, waitaminnit. You're the one who teaches martial arts, right? You'll probably kick my butt. OK, I'll just verbally disagree. Somewhat. With respect.
and I think the sentiment is consistent with Sun's, which migrated away from Vectors to ArrayLists, with the chief difference being, of course, that Vectors are synchronized, and ArrayLists are not.
Well, to my mind the biggest problem with Vectors is that even if you do require synchronization, most of the time Vector's synchronization is at the wrong level.
E.g. individual methods are synchronized, but you often need synchronization to prevent interruption between method calls as well. The most common example:
(I know we could use an Iterator above, but get(i) is still pretty common, and also faster.) The problem of course is if another thread does a remove() just before the get(), we may get a NoSuchElementException or something similar. Synchronization in this case (if required at all) needs to be handled by a block enclosing both the size() and the get(). This sort of thing happens fairly often, and the "thread safety" of Vector provides a false sense of security that does more harm than good, IMHO. That doesn't mean that synchronization per se should be avoided; it means that if you use synchronization, it should be put in at the right level.
It's true that Collections.synchronizedList() and similar methods allow us to put this false sense of security back into our programs, again, at the wrong level. But fortunately these methods are frequently overlooked by many developers, so they don't do as much damage.
Aside from the "false security" issue, there was a performance reason to move away from Vector's approach.
The move from Vector to ArrayList was because Sun wanted to make synchronization optional and more under the control of the programmer. By the time collections came along, there wasn't any practical way to add an "unsynchronized" keyword to the langauge which would somehow remove the synchronization that had already been put into Vector; they had to build a replacement from scratch. Much easier to put in synchronizeation when you need it, than to take it out (from a library class) when you don't.
A nested digression: speaking of the evil Vector class, why on earth do you (Max) use it in your Denny's DVDs solution? E.g. on pp. 128-131 of your book. The reservedDVDs variable would be much better off as a HashSet rather than Vector, no? (Especially if the database ever gets a larger number of records.) Is this a case where the assumption is that developers may not know their collections very well, and you didn't want to get into that discussion?
I suppose I can understand that. But my own opionion is, collections have been around for some time, and SCPJ 1.4 expects programmers to understand them. It's past time to stop coddling people by treating Vector as the default collection - at this point it holds people back more than it helps them.
OK, done with that bit of venting. End nested digression, back to our main digression.
However, the issue tipping factor, for me, is the code is not synchronized by default: thus, the Java language, by it's nature, seems to advocate synchronization on demand, and not in case.
Hmmm, maybe. But personally I'm having a hard time imagining the converse, to find a workable way to set the language up so that synchronization was the default.
Our positions may not be all that far apart; we may just be interpreting each other's arguments more extremely than they were intended. Your "synchronize only when you need to" would be more palatable to me if we add "but study carefully to be really sure you don't need to". Conversely my "when in doubt, synchronize" might be better with the addendum "but work hard to resolve your doubts, because unnecessary synchronization can just complicate things and bog them down, or create deadlock." Either way, analyzing how the code really works is critical to making a good decision here.
Ok, but what does this have to do with two locks? Basically, I think two locks are overkill, and can open to door to deadlocks and other threading problems. Even if you implement them correctly, the code is still that much more risky and complicated to maintain, because the people coming after you have to know how to do it correctly also. All in all, I'm an advocate of single monitors, if @ all possible
I agree with this sentiment in general, but I don't think it's possible to guarantee that the program will obey the API we're assigned to implement unless we synchronize on something in order to look up expected cookie values. Or maybe use volatile, if we decide to trust that. But latency in memory reads is too much of a risk, I think, if it prevents users from updating a record they've already unlocked.
Note that I'm not 100% sure I know what you mean when you refer to "two locks" - two synchronization to write and then read a cookie value? The problem is that "lock" is used in the assignment to refer to whether a record is locked or not; this is independent of whether we're currently in synchronized code or not. I try to refer to synchronization and monitors when that's what I mean, and use "lock" only to refer to whether a record is considered locked. It's all Sun's fault for using the word "lock" for two different things here.
A method dispatched by the RMI runtime to a remote object implementation may or may not execute in a separate thread. The RMI runtime makes no guarantees with respect to mapping remote object invocations to threads. Since remote method invocation on the same remote object may execute concurrently, a remote object implementation needs to make sure its implementation is thread-safe.
Now user A attempts an update() using cookie 12345. The update code needs to check the user-supplied cookie against the value it has in memory. IF the update is being made from a different thread than the one which had previously called lock(), then since the code here is not synchronized (and assuming we're not using volatile because it's unreliable), the current thread may manage to retrieve an older cached copy of the cookie, such as 0, rather than the new "correct" cookie value of 12345.
Now the "IF" above signals the part that I now realize is dubious: how likely is it that method calls from user A will be occurring in two different threads?
In summary, whenever multiple threads share mutable data, each thread that reads or writes the data must obtain a lock. Do not let the guarantee of atomic reads or writes deter you from performing proper synchronization. Without synchronizations, there is no guarantee as to which, if any, of a thread's changes will be observed by another thread.
Originally posted by Jim Yingst:
I don't see how this is a problem. Even if client A calls update from Thread #2, this is after thread #1 has finished and updated the Main Memory. Even if Thread #2 is unsynchronized, it can't possibly begin until thread # 1 is finished, because the client can't call it until thread #1 finished executing. And since Thread # 1 has updated Main Memory, then Thread #2 is forced into picking up the correct data.
I'm with you right up to the last sentence. If thread 2 is not synchronized, it may not be forced to go to main memory; it may have another cached copy. This depends on where these different threads are coming from exactly. If the RMI implementation is using some sort of thread pool to field method invocations as needed, then thread 2 may have been previously used to handle other method calls related to the same record, in which case it might still have a cached copy of what it thinks is the cookie for that record. In which case no, it doesn't need to go to main memory to refresh the cookie.
So OK, I don't know much about how RMI implementations actually work here. Can the pooled threads retain any instance data about the objects they're servicing? Dunno; quite possibly they can't. Does the run() method of a pooled thread enter a synchronized block at any point before it services another remote invocation? That would be sufficient to flush its memory, I think. But such details are largely unknown to us, and the RMI Spec quoted above doesn't fill me with confidence in this respect. So I fall back on words of the Book of Joshua 195:8-12:
[ June 01, 2003: Message edited by: Jim Yingst ]
# After a thread is created, it must perform an assign or load action on a variable before performing a use or store action on that variable. (Less formally: a new thread starts with an empty working memory.)
Originally posted by Jim Yingst:
Note to self: use 'book of Joshua' phrase at first opportunity
Glad you liked it.
I'll see your book of Joshua, and raise you the new testament, per the Java threading spec, as presented by Saint Gosling, and more recently to boot.
So this would refer to the Java Language Specification, 2nd Edition? (Published 2000, while BoJ is 2001?) Chapter 17: More Than You Ever Wanted To Know About Threads And Should Have Known Better Than To Ask? Okey-doke...
It say, in effect, that before a Thread can start working with a variable, it needs to copy that variable in from Main Memory
The closest I can find to this is from 17.3
This requires that at some point before a variable can be used, it has to be loaded from main memory. But it doesn't have to have been done recently - if the thread has been around a while (part of the thread pool) it may have done the load some time ago - no obligation to repeat it. If there were a lock or volatile, I can find the rules that would specifically require a fresh load - but without lock or volatile, it's not required.
Remember, everything is being driven by the clientThread here: Thread # 1 and Thread #2 are inside of it's scope.
I'm not sure what this means. Is clientThread a thread running on the client? And threads #1 and #2 are on the server, right? Not sure what "scope" means in this context. I agree that things happen in a certain order as far as the client is concerned, but without synchronization or volatile, that's not necessarily seen the same on the server.
And we know that cookieNum is as Thread #1 left it, because the language spec demands that Thread #1 update it in Main Memory before terminating.
How do we know thread 1 has actually terminated? Though in this discussion, thread 1's lock() was synchronized, so I agree that the correct cookie value is in main memory at this point. I just don't think thread #2 is obligated to use it.
As a matter of fact, the reason that volatile's instability is tolerable is because there is a guaranteed workaround, per the above.
Agreed. The situation with volatile is vexing because it's a direct violation of specs - but it's always been possible to wrap accesses to a variable with sync locks, which achieves the same effect with slightly poorer performance. Volatile is of marginal use anyway; we can avoid it entirely for long and double, and be none the worse for wear. That's way Sun hasn't been compelled to fix it.
Of course when you say "per the above" I'm not sure you're talking about replacing volatile with synchronization, as I am. But I agree there is a workaround.
So your concern, as I read it, is that RMI could be hanging on to copies of the variables in Main Memory from a previous calling thread, and that those copies could be out of data with what's currently in Main Memory?
AFIK, this is possible, but I don't think it's legal, from a language spec point of view.
I disagree as discussed above; but it's a pretty hairy spec, so maybe there's something I'm missing.
If it were, then you really couldn't use RMI without synchronizing on the remote object, and I'm not aware of any such restriction.
Or the remote object implementation could synchronize on something for every read or write of mutable shared data. That's how I interpret the RMI Spec I cited early on: "a remote object implementation needs to make sure its implementation is thread-safe". The only truly thread-safe ways to write or read mutable data are through synchronization or (maybe) volatile. IMO and all that. But I think Josh is on my side here, and Saint Gosling is curiously silent on some key points.
[ June 02, 2003: Message edited by: Jim Yingst ]
Originally posted by Jim Yingst:
I though that we has always agreed that you needed to synchronized inside of lock/unlock? It's only the modify that's in question.
Right - well I wasn't sure you knew the exact requirements for NX-Contractors, so I was stating the relevant parts in case.
But my main point in the last point was that, given that there's synchronization inside lock() and unlock(), it may be a good idea for other synchronization (e.g. update()) to be at the same level, in the same class, rather than somewhere further removed where it's harder to see how or if it interacts with the lock/unlock. My own original design had a number of different objects I sync on for various reasons, and while I understand how they interact I have to concede that it's a bit complicated for a junior programmer who might look at the code - there's a good chance they'd produce a deadlock by editing things carelessly. So I'm trying to simplify the use of synchronization in my code. Limiting the number of classes that use it is one aspect of this. Just bringing this up as a consideration, not a deal-breaker.
It's only the modify that's in question.
And create() and delete() if we do the sync inside Data, since Data does have these methods they should be implemented consistently with update(). And IMO read() might need some sync too, depending on how you've got things coded internally. But that's another set of discussions, never mind...