aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Single table / Simple Locking - WeakHashMap vs WeakReferences Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Single table / Simple Locking - WeakHashMap vs WeakReferences" Watch "Single table / Simple Locking - WeakHashMap vs WeakReferences" New topic
Author

Single table / Simple Locking - WeakHashMap vs WeakReferences

Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi everybody,
Sorry for those guys I confused with my posts where I introduced the WeakReference / ReferenceQueue pair. As my posts on it span multiple threads, the best solution I found to repair is was to post this new thread.
Introduction
Many people here seem to follow Max's idea to use a static WeakHashMap to store locks, as a solution to break deadlocks due to crashed client(s) still owning locks, if they decide it is worth while to handle deadlocks.
It consists to store as map keys objects which uniquely identify the client connection (owner of the lock), and as values the records (recNos) locked. In the solutions I have seen here around, the key is a Data instance.

The main idea of the WeakHashMap solution
As WeakHashMaps store their keys as WeakReferences (objects which are automatically cleared by the garbage collector when there is no strong ("strong" = "normal" ) reference to them anymore - simplified on purpose), and clear their corresponding entries when they are notified (I would better write "when they know" - see below) that one (or more) of their keys have been cleared, it seems a compelling solution to handle crashed clients, as far as the WeakHashMap keys identify clients.
The related design choices
Before digging into the technical issues of that solution, let us have a look at the main design choices the solution implies (they are no issues per se, just prerequisites) :
  • You accept that your database system consists in and supports only one table (the map is static and records are identified as simple recNos). All people here (as far as I know and ... except me ), agree that it is acceptable for our assignment. OK.
  • You decided to handle the "crashed client" situation. Strangely, nearly as many people agree on the fact that there is no need to deal with crashed clients, while there are many posts on the question . Anyway Max, Andrew, Mark, Jim, I mean all the JavaRanch SCJD forum's gurus, agree on that.
  • You accept that, by design, you won't support more than one record lock granted at a time to a given client.
  • Your design is based on multiple Data instances, one per client (any solution based on some Data singleton is incompatible with this solution)


  • It means that locking design has no interest (or at least needs to be adapted) if your database is multi-tables by design because you decided so, or if you decided not to care about lost connections.
    BTW, as my own design is multi-tables, I won't introduce here my own way of locking which is a bit different.
    The issues of the WeakHashMap solution
    I see two issues :
    The key cannot be the recNo
    It must be some object representing the client, which is not natural and couterperformant.
    When some caller claims for a lock, it has to check in a loop if the given recNo is locked or not (waiting if true, returning if false) ... while the map against which this check must be performed offers no optimal way to do it (containsValue() vs containsKey()). As a matter of fact, the key (which is the fast-access data of the map, is even never queried). You'll agree
    with me that it is, by design, a quite bad example of a use of the Map interface.
    Moreover, that's just because of this contraint that a client cannot own multiple locks (a Map key must be unique).
    The lack of notification
    When a client crashes, its entries are cleared from the WeakHashMap. So far so good. But as they are silently, other callers waiting for a lock on the same recNos, wait forever.
    Andrew's partial solution
    Andrew, I say "partial" because it just solves the second issue. Besides the static WeakHashMap, you have a static int locksCount, incremented each time a lock is granted and decremented when it is released. Besides both, you have a Thread dedicated to that task, which will periodically (let's say every second) compare the locksCount value to the WeakHashMap.size(). If they are different, it means that some entry(ies) were cleared from
    the map and you just need to notifyAll() all waiting callers.
    The WeakReference solution
    It is quite similar (and nearly as simple), except both issues mentioned above are solved :
    The key is the recNo (locking is more performant and allows multiple locks to be granted per client), and lock claimers get notified in a similar way Andrew does with its locksCount solution.
    Compared Codes
    As we cannot post complet code solutions, the following code examples are incomplete on purpose (for example exceptions are ignored). Also be aware of the fact that I didn't test the code snippets below : they are posted just for comparison of both techniques. You'll notice that the solution based on WeakReferences is only 3 lines longer than the WeakHashMap solution (70<>73)). Not a big deal !
    WeakHashMap

    WeakReferences registred with a ReferenceQueue

    Best,
    Phil.
    [ November 04, 2003: Message edited by: Philippe Maquet ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Phil,
    Before we open the debate, let's establish common assumptions. To point, I'd like to challenge your first assumption.

    The key cannot be the recNo
    It must be some object representing the client, which is not natural and couterperformant.

    Actually, the key can be the record number, depending on your implantation. However, that's beside the point. I disagree with your assertion that this is 'not natural', and I further disagree that Data does not represent a client.
    Each Data is uniquely assigned to a client, per your stated assumption, and thus each Data does, in fact, uniquely represent a client. This is similar to a pattern used by, say, cell phones. When you buy your cell phone, you receive an Object that is unique to you: the phone itself. The system in place, the phone company, doesn't track Phil: they track the object they used to uniquely identify you as a part of their system: namely, your phone, and they indirectly tie that object to you by the protocols of usage. That is, they assume that when a call is made, it's made by the client(Phil). They base their entire business model on it. This is exactly what is happening here. We are assigning a unique object to each client( namely, the Data object), and we are tracking that unique object.
    As to your first point: would you mind spelling out for me what 'couterperformant' means to you? I don't want to start debating a strawman

    When some caller claims for a lock, it has to check in a loop if the given recNo is locked or not (waiting if true, returning if false) ... while the map against which this check must be performed offers no optimal way to do it (containsValue() vs containsKey()). As a matter of fact, the key (which is the fast-access data of the map, is even never queried). You'll agree
    with me that it is, by design, a quite bad example of a use of the Map interface.


    I wold counter this is not a bad thing. A great many business don't [b]want[/b[ their users updating more then a single record at a time. An airline I worked for was this way. There's no justification in assuming that this 'feature' is something the client wants. It may simply be what you want to code
    M


    Java Regular Expressions
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Max,
    Actually, the key can be the record number, depending on your implantation. However, that's beside the point. I disagree with your assertion that this is 'not natural', and I further disagree that Data does not represent a client.

    I think you misunderstood me (my English is quite poor). Of course Data may represent the client, as it is unique per client. What I feel "not natural" with the WeakHashMap solution is that you must store the Data instance as a key instead of a value. WeakHashMap stores its keys as WeakReferences, right ? So I don't understand how you could invert, with a WeakHashMap implementation, the key/value pair (I mean storing recNos as keys and Data instances as values).
    Each Data is uniquely assigned to a client, per your stated assumption, and thus each Data does, in fact, uniquely represent a client. This is similar to a pattern used by, say, cell phones. When you buy your cell phone, you receive an Object that is unique to you: the phone itself. The system in place, the phone company, doesn't track Phil: they track the object they used to uniquely identify you as a part of their system: namely, your phone, and they indirectly tie that object to you by the protocols of usage. That is, they assume that when a call is made, it's made by the client(Phil). They base their entire business model on it. This is exactly what is happening here. We are assigning a unique object to each client( namely, the Data object), and we are tracking that unique object.

    I agree at 100% (see above).
    As to your first point: would you mind spelling out for me what 'couterperformant' means to you? I don't want to start debating a strawman

    Sorry, but I don't understand your sentence I don't know what a strawman is (and I don't find it in my dictionary). I see that I wrote 'couterperformant' instead of 'counterperformant', while the latter, even "corrected", is maybe not an english word : just a free translation of "contre-performant" in french. What I meant is simply this : Map.containsValue() is less efficient than Map.containsKey(). So if recNos are stored as values (the key being a Data), the lock method() is not optimized. Nothing more than that.
    Best,
    Phil.
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Philippe Maquet:
    Hi Max,
    quote:
    --------------------------------------------------------------------------------
    Actually, the key can be the record number, depending on your implantation. However, that's beside the point. I disagree with your assertion that this is 'not natural', and I further disagree that Data does not represent a client.
    --------------------------------------------------------------------------------
    I think you misunderstood me (my English is quite poor). Of course Data may represent the client, as it is unique per client. What I feel "not natural" with the WeakHashMap solution is that you must store the Data instance as a key instead of a value. WeakHashMap stores its keys as WeakReferences, right ? So I don't understand how you could invert, with a WeakHashMap implementation, the key/value pair (I mean storing recNos as keys and Data instances as values).

    Let's let the issue how you could invert the key/value stand for a moment, and focus on the 'unnatural' issue, since it's independent. IMO, it's quite intuitive for a given objet to retain a lock. Or, as the example below shows, for the proxy of that object to retain a lock. This was my point in the example below. The cell phone, in metaphor, is a stand in for a Data object, hopefully demonstrating that, since this is actually the way some things are modeled, it is, in fact, natural.
    In situations where there is a one to one relationship, say between a husband and a wife, it's just as natural to say that the wife owns the husband as it is to say that the husband owns the wife. Of course, I hear this doesn't strictly apply in Paris , but you get my point. Since we haven't established that anything other then a 1-1 solution is called for, I think it's just as natural to say that the client owns the lock, as it is to say that the lock owns the client. If anything, the latter seems somewhat unnatural to me, from an OO prespective. I don't think of my self as belonging to my stuff, I think of them as belonging to me. Of course, this is simply a matter of prespective, and I'm sure one is as good as the other. However, I'm not convinced that one is better then the other.

    quote:
    --------------------------------------------------------------------------------
    Each Data is uniquely assigned to a client, per your stated assumption, and thus each Data does, in fact, uniquely represent a client. This is similar to a pattern used by, say, cell phones. When you buy your cell phone, you receive an Object that is unique to you: the phone itself. The system in place, the phone company, doesn't track Phil: they track the object they used to uniquely identify you as a part of their system: namely, your phone, and they indirectly tie that object to you by the protocols of usage. That is, they assume that when a call is made, it's made by the client(Phil). They base their entire business model on it. This is exactly what is happening here. We are assigning a unique object to each client( namely, the Data object), and we are tracking that unique object.
    --------------------------------------------------------------------------------
    I agree at 100% (see above).

    quote:
    --------------------------------------------------------------------------------
    As to your first point: would you mind spelling out for me what 'couterperformant' means to you? I don't want to start debating a strawman
    --------------------------------------------------------------------------------
    Sorry, but I don't understand your sentence I don't know what a strawman is (and I don't find it in my dictionary).

    A straw man argument is one in which one party misinterprets the points of the other party, and addresses that misinterpretation. For example, if you made a comment about stupid American tourists, and I launched into a tirade about how Americans aren't stupid, I would be making a strawman of your argument: I wouldn't be addressing your point, I would be addressing a point that you didn't make, in order to win that point. I want to avoid doing that, so I asked for your clarification.

    I see that I wrote 'couterperformant' instead of 'counterperformant', while the latter, even "corrected", is maybe not an english word : just a free translation of "contre-performant" in french.

    I only wish I spoke French as well as you speak English.

    What I meant is simply this : Map.containsValue() is less efficient than Map.containsKey().

    Why would you say that?
    M
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [PM]: What I meant is simply this : Map.containsValue() is less efficient than Map.containsKey().
    [MH]: Why would you say that?

    Maps, Hash- or Tree-, are designed to look thinks up by their keys, and quickly. Implementing contains() is pretty similar to just doing a get() and seeing if it's null or not. For a HashMap, you just calculate the hashCode(), look in the appropriate bucket, and maybe iterate through a reasonably small number of entries that are in that same bucket. It's O(1). For a TreeMap it's O(log(n)) but still pretty quick. In contrast, for containsValue() you have no choice but to iterate through every single entry in the Map, and see if the value matches your target. Best-case scenario you get lucky and find your target early on, and you can abort the loop, but don't count on it. It's an O(n) operation.


    "I'm not back." - Bill Harding, Twister
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Jim Yingst:
    [PM]: What I meant is simply this : Map.containsValue() is less efficient than Map.containsKey().
    [MH]: Why would you say that?

    Maps, Hash- or Tree-, are designed to look thinks up by their keys, and quickly. Implementing contains() is pretty similar to just doing a get() and seeing if it's null or not. For a HashMap, you just calculate the hashCode(), look in the appropriate bucket, and maybe iterate through a reasonably small number of entries that are in that same bucket. It's O(1). For a TreeMap it's O(log(n)) but still pretty quick. In contrast, for containsValue() you have no choice but to iterate through every single entry in the Map, and see if the value matches your target. Best-case scenario you get lucky and find your target early on, and you can abort the loop, but don't count on it. It's an O(n) operation.

    Jim is correct here: I was thinking about fact that both are accessible as Sets, through the Map.keySet and Map.entrySet methods, thus equally fast to look through: I didn't think of the simple existence test. However, as far I know, this is an implementation choice: I'm not aware of any specification based requirement that Maps be implemented in this manner, nor, AFIK, can you depend on them being so on different platforms.
    I would also maintain that speed isn't a factor in this project. Frankly, given the number of records at issue, I doubt the time differential is even measurable. This leads us back to the same point: Namely, that ordering record-to-Data satisfies non existent requirements: namely, speed(probably) and multiple record locking. Thus, IMO, ordering Data-to-record is just as natural.
    M
    John Smith
    Ranch Hand

    Joined: Oct 08, 2001
    Posts: 2937
    Best-case scenario you get lucky and find your target early on, and you can abort the loop, but don't count on it.
    As a matter of fact, the worst-case scenario is more likely, as most of the time the record would not be locked and therefore will not be in the map. Thus, you would have to iterate through the entire collection only to realize that the object you are looking for is not there. This would be a very nonstandard use of a Map, indeed.
    In the app that I work on at the moment, we actually had a similar problem, where an ArrayList was used as a collection which was used to test for a membership. Swapping the ArrayList with the properly structured Map gave us a 500% boost in performance!
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Max]: I'm not aware of any specification based requirement that Maps be implemented in this manner, nor, AFIK, can you depend on them being so on different platforms.
    Strictly speaking, that's true. At the generic Map level, the API says containsValue() "will probably require time linear in the map size for most implementations of the Map interface." Nothing is said about containsKey(). Looking at specific implementations however - TreeMap does guarantee O(log(n)) for containsKey(). HashMap doesn't mention getKey()'s performance, but it does guarantee O(1) for get(), and common sense is that constainsKey() should be the same. Neither implementation says anything more about containsValue()'s performance (though TreeMap inherits the API comment from Map here, probably linear time). Unless someone magically comes up with a way to beat O(log(n)) or O(1) for containsValue() on a TreeMap or HashMap respecively, we can be pretty sure containsKey() will offer better performance for these two classes at least.
    As for different platforms - well, HashMap and TreeMap are just using non-native code really, using standard Sun SDKs. The behavior might be different on other JDKs maybe, but not other platforms, IMO.
    I would also maintain that speed isn't a factor in this project.
    This I'll basically agree with, along with the remainder of the paragraph.
    [Eugene]: As a matter of fact, the worst-case scenario is more likely
    Also agreed.
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Many thanks to you, Jim and Eugene, for having entered the discussion to argue about why it's more efficient to access a HashMap by key than by value. "Not natural" in my mind (direct translation from french) meant "counterintuitive and less efficient".
    Anyway, you are far more efficient than I can be to debate with Max. As we don't master the language (english) at the same level, sometimes I feel myself as he would feel himself if he had to kickbox against me with hands and feets bound...
    Now about that Map access discussion, I would argue that it is not a specification question. I use a HashMap (or a WeakHashMap) which are implementations. Any user of them gets their sources. Even if the doc is unclear (I don't pretend that), it's so easy to see (when you are curious as I am) what is the job performed by containskey() and containsValue().
    I passed SCJP in may 2003 (correctly : 95%). One of the things you have to know to pass SCJP is how to recognize, among the few collections offered by the API, which is the best one to be used in a given context. It is one of the easiest parts of SCJP, because those classes are well named. It means that any junior java programer (SCJP level) knows that a HashMap is a Map (key-value pair), where keys are stored in buckets based on their hashCode(), a structure which favours acces by key over by value.
    Well, anyway I initiated this thread just to handle a locking issue, especially to reply to Ulrich's question (I am not concerned by the given issue anyway), there is not any comment about the compared solutions above and ... Ulrich seems to have quit us. Some days, I'd better to go and sleep sooner
    Best,
    Phil.
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    /*
    Originally posted by Philippe Maquet:
    Many thanks to you, Jim and Eugene, for having entered the discussion to argue about why it's more efficient to access a HashMap by key than by value. "Not natural" in my mind (direct translation from french) meant "counterintuitive and less efficient".
    Anyway, you are far more efficient than I can be to debate with Max. As we don't master the language (english) at the same level, sometimes I feel myself as he would feel himself if he had to kickbox against me with hands and feets bound...

    I would never kick or punch you....I'd probably just grab you and throw you . But seriously, you make too much of English Issue: I think you argue well.

    Now about that Map access discussion, I would argue that it is not a specification question. I use a HashMap (or a WeakHashMap) which are implementations. Any user of them gets their sources. Even if the doc is unclear (I don't pretend that), it's so easy to see (when you are curious as I am) what is the job performed by containskey() and containsValue().
    I passed SCJP in may 2003 (correctly : 95%). One of the things you have to know to pass SCJP is how to recognize, among the few collections offered by the API, which is the best one to be used in a given context. It is one of the easiest parts of SCJP, because those classes are well named. It means that any junior java programer (SCJP level) knows that a HashMap is a Map (key-value pair), where keys are stored in buckets based on their hashCode(), a structure which favours acces by key over by value.

    Well, I hope I qualify as a Junior Programmer, and while I 'knew' it, I missed it. It's true that ordered keys, such as hashMaps keys, are much more easy to search then unordered ones(like Hashmap values). However, it bears keeping in mind that Maps provide access to both their Keys and their values as Sets.
    However, this part of the discussion, IMO, is some tangential: it doesn't really matter, because speed is not a consideration in this project. Also, the difference in speed here, given the size of the records, is immeasurable. Thus, the argument reduces to 'ordering records to Data is better then ordering Data to records because it's immeasurably faster, and speed is a non issue in this assignment'. Personally, I'm not convinced by this.

    Well, anyway I initiated this thread just to handle a locking issue, especially to reply to Ulrich's question (I am not concerned by the given issue anyway), there is not any comment about the compared solutions above and ... Ulrich seems to have quit us. Some days, I'd better to go and sleep sooner
    Best,
    Phil.

    I think it's a worthwhile discussion, I just want to clarity the assumptions we'l be working with. So far, I don't think we've established a requirement or OO based reason to map records to Data, while I have offered some examples of why it's intuitive, in somewhat similar cases, to map Data to records.
    All best,
    M
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Hey Jim,
    Originally posted by Jim Yingst:
    [QB][Max]: I'm not aware of any specification based requirement that Maps be implemented in this manner, nor, AFIK, can you depend on them being so on different platforms.
    Strictly speaking, that's true.

    My point is that when you talking about a project designed to run consistently on different OS systems and/or different JVMs, it's important to speak strictly, especially when you're showcasing just what a through and competent and careful software engineer you are. Just my opinion

    .snip...and common sense is that constainsKey() should be the same. Neither implementation says anything more about containsValue()'s performance (though TreeMap inherits the API comment from Map here, probably linear time). ...snip.. we can be pretty sure containsKey() will offer better performance for these two classes at least.

    While this is true, I wouldn't advise using 'common sense' and 'probably' as elements in your design choices document. Again, just my advice. If you can't make a statement about the nature of the APIs that's guaranteed to be platform independent, then I suggest that you not make it.

    As for different platforms - well, HashMap and TreeMap are just using non-native code really, using standard Sun SDKs. The behavior might be different on other JDKs maybe, but not other platforms, IMO.

    Why not? two implementations of the same JDK on two different platforms are simply two different programs. I'm not aware that Sun guarantees that the same algorithm will be used in both cases.

    All best,
    M
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Max,
    Thanks for your reply.
    Also, the difference in speed here, given the size of the records, is immeasurable.

    I suppose that while writing "given the size of the records" you were thinking of the number of locks granted at a given same time. Even if the database has 100000 records, you probably won't have more than an average 100 records locked along them. So yes, the difference in speed will be immeasurable.
    I would just comme back on the context of my first post here :
    The first time I saw a locking solution based on a WeakHashMap mapping Data to records,
  • I asked : Why do you map Data to records instead of records to Data ? It seemed conterintuitive to me and moreover forced to limit the number of locks per Data to only one at a time. My question was stupid because Data (which represents the owner of the lock) must be the key in the WeakHashMap.
  • I pointed out the lack of notification which made its solution buggy.


  • Now, from his answer, I got the feeling that his design had been driven by its code and not the contrary :
    At first he had mapped records to Data (more intuitive for him too), and then he inverted the mapping because the WeakHashMap container forced him to do so. With the consequence that he couln't grant more that one lock per client at the time... Then he concluded that it was acceptable because in his current application he doesn't need more than one lock per client. So coding constraints made his final design.
    I would like to add just one argument in favour of records mapped to Data as WeakReferences in a regular HashMap, against Data mapped to records in a WeakHashMap : avoiding the WeakHashMap overhead. As the latter stores its keys as WeakReferences, it needs an internal Entry class extending WeakReference, mainly to override equals() and hashCode(). While when Data are stored in the map as values, the "basic" WeakReference is enough, the default Object equals() (which is faster) being perfect for our purpose.
    Thank you for all your comments, Max, there were very interesting to me.
    All best,
    Phil.
    [ September 12, 2003: Message edited by: Philippe Maquet ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Philippe Maquet:
    Hi Max,
    Thanks for your reply.

    I suppose that while writing "given the size of the records" you were thinking of the number of locks granted at a given same time. Even if the database has 100000 records, you probably won't have more than an average 100 records locked along them. So yes, the difference in speed will be immeasurable.
    I would just comme back on the context of my first post here :
    The first time I saw a locking solution based on a WeakHashMap mapping Data to records,
  • I asked : Why do you map Data to records instead of records to Data ? It seemed conterintuitive to me and moreover forced to limit the number of locks per Data to only one at a time. My question was stupid because Data (which represents the owner of the lock) must be the key in the WeakHashMap.
  • I pointed out the lack of notification which made its solution buggy.


  • Now, from his answer, I got the feeling that his design had been driven by its code and not the contrary :
    At first he had mapped records to Data (more intuitive for him too), and then he inverted the mapping because the WeakHashMap container forced him to do so. With the consequence that he couln't grant more that one lock per client at the time... Then he concluded that it was acceptable because in his current application he doesn't need more than one lock per client. So coding constraints made his final design.

    But this is common practice in all forms of engineering: that's the difference between engineers and scientists!
    Architects frequently limit the size and height of their buildings due to construction materials, local statues, etc. electrical engineers will use one metal over another because it's a better conductor, Chemists with use one element over another because it's cheaper. Doctors prescribe one medicine over another because it's covered by the patient's insurance. It's part of the game of getting along with out environment and it's at the heart of practice-to-theory feedback loop that allows engineering to make sure the world continues to spin property. I don't think he compromised his design: I think he made it better by taking reality into account


    I would like to add just one argument in favour of records mapped to Data as WeakReferences in a regular HashMap, against Data mapped to records in a WeakHashMap : avoiding the WeakHashMap overhead. As the latter stores its keys as WeakReferences, it needs an internal Entry class extending WeakReference, mainly to override equals() and hashCode(). While when Data are stored in the map as values, the "basic" WeakReference is enough, the default Object equals() (which is faster) being perfect for our purpose.

    First, this harkens back to the efficiency issue, which we've establish is non material to the SCJD. Second, this violates the directive of using standard Java classes, even if a custom class is marginally faster. Finally, there's actually an excellent reason why they did this(the overloading), as elegantly explained in Joshua Brock's 'Effective Java'.

    Thank you for all your comments, Max, there were very interesting to me.
    All best,
    Phil.
    [ September 12, 2003: Message edited by: Philippe Maquet ]

    No problem. I try to set aside about an hour an day for the 'Ranch, though today it's been closer to two hours, so my apologies if I don't get to any other topics today.
    All best,
    M
    [ September 12, 2003: Message edited by: Max Habibi ]
    John Smith
    Ranch Hand

    Joined: Oct 08, 2001
    Posts: 2937
    PM: As the latter [the Data->Record map, -- my insert] stores its keys as WeakReferences, it needs an internal Entry class extending WeakReference, mainly to override equals() and hashCode().
    Actually, the internal Entry class in the WeakHashMap class extends Map.Entry, not WeakReference. You were probably thinking of WeakKey that does extends WeakReference, but then again, you don't need to override either the equals() method of WeakKey or the equals() method of Data. The equals() method of WeakKey simply calls the equal method of Data, and if the Data method is not overriden, it will test for object identity, which is exactly what should be expected.
    In fact, overriding the equals() and the hashCode() methods of Data may be lethal (in terms of you hoping to achive the automatic GC of weakly referenced objects) if you are planning to use it as a key in the Data->Record map.
    Note that javadoc for WeakHashMap specifically warns of that:
    With such recreatable key objects, however, the automatic removal of WeakHashMap entries whose keys have been discarded may prove to be confusing.
    All the subtleties of WeakHashMap aside, I don't like the idea of instance of Data being used either as a key or the value in the locks collection. But we've beaten that topic to death before.
    [ September 12, 2003: Message edited by: Eugene Kononov ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Eugene,
    Actually, the internal Entry class in the WeakHashMap class extends Map.Entry, not WeakReference.

    Sorry, but here is how Entry is defined in WeakHashMap.java :

    Best,
    Phil.
    John Smith
    Ranch Hand

    Joined: Oct 08, 2001
    Posts: 2937
    Sorry, but here is how Entry is defined in WeakHashMap.java :
    private static class Entry extends WeakReference implements Map.Entry

    Here is my version (from JDK1.3):
    static private class Entry implements Map.Entry
    Yours is probably form JDK1.4.
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    <post topic="Map performance" relevance="tangential">
    Hola, Max!
    [Max]: My point is that when you talking about a project designed to run consistently on different OS systems and/or different JVMs, it's important to speak strictly, especially when you're showcasing just what a through and competent and careful software engineer you are. Just my opinion
    And my point later on was that it's easy to be more specific on this issue by focusing on one of the standard concrete Map implementations, which do offer better guarantees. In my own posts on this topic, I've been specifically talking about HashMap and TreeMap.
    [Jim]: ...and common sense is that constainsKey() should be the same. Neither implementation says anything more about containsValue()'s performance (though TreeMap inherits the API comment from Map here, probably linear time). ...snip.. we can be pretty sure containsKey() will offer better performance for these two classes at least.
    [Max]: While this is true, I wouldn't advise using 'common sense' and 'probably' as elements in your design choices document. Again, just my advice.

    Well yeah, I also won't say things like "dunno" and "yeah". But I'm not writing the design doc right here and now, I'm posting in the Big Moose Saloon.
    [Max]: If you can't make a statement about the nature of the APIs that's guaranteed to be platform independent, then I suggest that you not make it.
    Like, say, about FileChannel? In the case of HashSet, since they neglected to offer an O(1) guarantee for containsKey(), even though it's blindingly obvious that that's the behavior we should expect, I suppose that we can just ignore the containsKey() method and replace it with (map.get(key) != null). (As long as you never insert null values into the Map, which is an easy design choice to make.) There, we now have nice guaranteed behavior for HashMap. We already had an O(log(n)) guarantee for TreeMap, without having to resort to such silliness.
    So: we have an O(1) guarantee for HashMap's get(). We have O(log(n)) for TreeMap's containsKey(). For containsValue(), any implementation, we have no guarantee, but it's "probably" O(n). (Sorry for unprofessional use of the word "probably", but it's from the API after all.) And in reality, it is O(n) for HashMap and TreeMap. So OK, there's a theoretical possibility that on some JDKs/platforms/alternate realities the containsValue() might turn out faster than containsKey(). Best case scenio, we might get lucky somehow. But there's no denying that for HashMap and TreeMap, get() and containsKey() offer better guarantees for wort-case (and typical) behavior. Heck, containsValue() could legally turn out to be O(n^10) for all we know - containsKey() is still O(1), or O(log(n)). Which would you choose?
    [Jim]: As for different platforms - well, HashMap and TreeMap are just using non-native code really, using standard Sun SDKs. The behavior might be different on other JDKs maybe, but not other platforms, IMO.
    [Max]: Why not? two implementations of the same JDK on two different platforms are simply two different programs. I'm not aware that Sun guarantees that the same algorithm will be used in both cases.

    Note that I didn't use "guarantee" in that paragraph; I said IMO, and was speaking more loosely. What I meant was that if I run J2SDK 1.4.2 on a Windows box, and on a Sun box, and on a Linux box, they all use the exact same rt.jar file, which has the exact same non-native Java code. I think. Guess I'm not 100% sure of this, and I can't easily check it right now. But I've always understood that the differences between these versions were only in native methods and in the JVM infrastructre, not in the rt.jar shared library. OTOH, if you switch to a different JDK version, this goes out the window - rt.jar will be different.
    Regardless, even if there's no guarantee that the algorithms will be the same on different JDKs or platforms - Sun does offer the performance guarantees for HashMap and TreeMap that I've been talking about. Alternate platform implementations can implement however they want, but they're required to give O(1) performance for HashMap's get(), and O(log(n) for TreeMap's containsKey().
    Which is why Philippe made the statement: Map.containsValue() is less efficient than Map.containsKey(). That's speaking a bit loosely - if you want we can say "HashMap's get() and TreeMap's containsKey() offer better performance guarantees than Map's containsValue()." But personally I think the original statement was quite clear enough for us to fill in the blanks.
    BTW, Philippe - I had the exact same reactions as Max reagarding your use of language. I wish I spoke French as well as you speak English, and I think you worry too much about your alleged deficiencies here. E.g. I had no problem figuring out what "counterperformant" meant. You're doing great; stop apologizing.
    </post>
    [ September 12, 2003: Message edited by: Jim Yingst ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Jim,
    BTW, Philippe - I had the exact same reactions as Max reagarding your use of language. I wish I spoke French as well as you speak English, and I think you worry too much about your alleged deficiencies here. E.g. I had no problem figuring out what "counterperformant" meant. You're doing great; stop apologizing.

    Thank you for that comment. After three months passed on this forum, I noticed that I improved both my Java experience and my english practice (for writing, I need less my dictionary than before). It's just when I see that I've been misunderstood that I wonder whether it's due to a language issue ... somethink like the feeling that if I could have explained it in french, it would have been easier to understand. Anyway, even you are misunderstood sometimes ... So I think you're right : I should loose that language complex. Promised !
    Best,
    Phil.
    [ September 14, 2003: Message edited by: Philippe Maquet ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Sorry Philippe, I didn't understand that. Could you repeat it more clearly please?
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Sorry Jim, but rereading me, I didn't understand my own english ! So I couldn't explain it, even in french ... It was quite vague indeed !
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Jim,
    I'm not sure how TreeMaps got into this discussion, but let me see if I can boil all of this down. Performance is irrelevant to the SCJD, and we can't make any definitive statements about the performance benefits of checking to see if a HashMap contains a value versus it containing a key.
    Regarding FileChannels. I think we'll just have to agree to disagree on that one. I don't think I have to capacity to communicate my points any more clearly then I did on that issue. If I haven't been convincing, then I'll have to chalk it up to my need to become a better communicator.
    All best,
    M
    [ September 15, 2003: Message edited by: Max Habibi ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Philippe Maquet:
    Sorry Jim, but rereading me, I didn't understand my own english ! So I couldn't explain it, even in french ... It was quite vague indeed !

    I think Jim was just teasing you
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    I'm not sure how TreeMaps got into this discussion
    Convenient example of a Map with an explicit performance guarantee. You and Philippe had made general comments about Map; I refernced specific implementations. We can stick to HashMap though.
    but let me see if I can boil all of this down. Performance is irrelevant to the SCJD,/[b]
    Ok.
    [b]and we can't make any definitive statements about the performance benefits of checking to see if a HashMap contains a value versus it containing a key.

    Can, did. Referring to both minimum guaranteed performance, and to de facto actual performace, but not to what might happen in alternate universes. If you explicitly say containsKey() rather than "containing a key" then I'll agree with you (thanks to another careless oversight in API writing), but as I can use get() to test the same thing with guaranteed O91) performance, I'm sticking with my answer; don't see a refutation. The issue is ultimately not that important though, so agreeing to disagree here is fine too.
    Regarding FileChannels. I think we'll just have to agree to disagree on that one.
    Ok. Didn't mean to renew the topic per se, I just found it interesting that our roles this time are to some extent reversed. Last time I was the one saying "the API doesn't make that guarantee" and you were saying "yes it does". Just a funny coincidence; no deeper meaning.
    I think Jim was just teasing you
    Yup.
    [ September 15, 2003: Message edited by: Jim Yingst ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Hi Jim,
    Originally posted by Jim Yingst:
    and we can't make any definitive statements about the performance benefits of checking to see if a HashMap contains a value versus it containing a key.
    Can, did. Referring to both minimum guaranteed performance, and to de facto actual performace, but not to what might happen in alternate universes. If you explicitly say containsKey() rather than "containing a key" then I'll agree with you (thanks to another careless oversight in API writing), but as I can use get() to test the same thing with guaranteed O91) performance, I'm sticking with my answer; don't see a refutation. The issue is ultimately not that important though, so agreeing to disagree here is fine too.


    Phil's point, which I think you supported, was that containsKeys was faster the containsValue. We've established that this statement isn't supported by the Spec, even if it happens to the true on most(maybe even all) implementations.
    Now, if your point has become that using hashMap.get is faster than hashmap.containsValue, then you're undermining the original point that using a record-key mapping is faster. To wit,

    is just as efficient as

    thus, while your point regarding get is important, it's irrelevant .
    The discussion was originally about one mapping style(record-Data) being more efficient (which I think I just established isn't true), and/or the discussion was about was about the merits of containsKey vs. containsValue(which merits we can't establish either, strictly speaking :which is the only we should be speaking). AFIK, the discussion was never about TreeMaps, map.get, or alternative universes
    All best,
    M
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Ummm, I can't tell what either of your code samples are supposed to do, Max. Doesn't contains() return a boolean? Doesn't equals() expect an Object? I can't determine how relevant get() is without a better example.
    AFIK, the discussion was never about TreeMaps, map.get, or alternative universes
    Well for our universe, for any existing HashMap implementation (or any Sun will ever release, I'll bet), containsKey() is more efficient than containsValue(). It's a pity they neglected to document that fact in the API for HashMap. But to assert that we shouldn't state that HashMap containsKey() is more efficient than containsValue() - well, it is. Documented or not. You're apparently ignoring the very purpose of Maps - to look thing up by key. Looking up by value is an afterthought.
    OK, gotta take a deep breath now. Hey Max, I don't suppose you wanna chime in on this thread? You know, if you get bored or something. Cheers...
    [ September 15, 2003: Message edited by: Jim Yingst ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Jim Yingst:
    Ummm, I can't tell what either of your code samples are supposed to do, Max. Doesn't contains() return a boolean? Doesn't equals() expect an Object? I can't determine how relevant get() is without a better example.

    Oops, substitute get for contains. And equals expects an object, per the Object.equals method.
    M

    AFIK, the discussion was never about TreeMaps, map.get, or alternative universes
    Well for our universe, for any existing HashMap implementation (or any Sun will ever release, I'll bet), containsKey() is more efficient than containsValue(). It's a pity they neglected to document that fact in the API for HashMap.

    I'm guessing they didn't document it because they wanted to be able to change it in the future: perhaps by making containsValue more efficient.

    But to assert that we shouldn't state that HashMap containsKey() is more efficient than containsValue() - well, it is. Documented or not. You're apparently ignoring the very purpose of Maps - to look thing up by key. Looking up by value is an afterthought.

    Jim, this is just nonesense. If it's not true per spec, then you shouldn't assume it true across the board. For example, it would be easy enough to keep a reverse map in values to keys(with a few tricks: not too many), that would reduce the time of containsValue to be comparable to containsKey. Are you willing to bet your certification fee that it isn't? For that matter, are you willing to bet the certification fee of the people you're advising?

    OK, gotta take a deep breath now. Hey Max, I don't suppose you wanna chime in on this thread? You know, if you get bored or something. Cheers...
    [ September 15, 2003: Message edited by: Jim Yingst ]

    heh. You couldn't get me near that crap for all the beer in England. I'm suprised Joe's so involved
    M
    [ September 15, 2003: Message edited by: Max Habibi ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Oops, substitute get for contains.
    OK, so we are indeed talking about HashMap's get(i), which is guaranteed to be O(1), rather than containsKey(), which is not guaranteed, even though for all practical purposes it is in fact O(1). Good, that makes things easier.
    [Jim]: Well for our universe, for any existing HashMap implementation (or any Sun will ever release, I'll bet), containsKey() is more efficient than containsValue(). It's a pity they neglected to document that fact in the API for HashMap.
    [Max]: I'm guessing they didn't document it because they wanted to be able to change it in the future: perhaps by making containsValue more efficient.

    Ummm... OK. I was talking about the lack of documentation for the fact that HashMap's containsKey() is O(1). Fortunatelly, this is no longer relevant, because now I can concentrate on the fact that HashMap's get() is documented to be O(1). Which is what I wanted.
    Jim, this is just nonesense. If it's not true per spec, then you shouldn't assume it true across the board.
    Mmmm, irony.
    Oops, I guess I shouldn't have brought up the generic Map again, since it's more productive for me here to talk HashMaps (as I did originally). OK, I retract my last couple statements about the purpose of Maps, and containsValue() being an afterthought. It's generally true, but yeah, there can be exceptions at the Map level. I still stand by my assertion that for any HashMap implementation in existence, and any Sun is likely to release ever, containsKey() is (on average, for any significant value of N) faster than containsValue(). Note that I do not include classes that extend from HashMap in this statement - just HashMap's implementation. (I've been talking about HashMap's get() vs. MashMap's containsValue(), after all.)
    For example, it would be easy enough to keep a reverse map in values to keys(with a few tricks: not too many), that would reduce the time of containsValue to be comparable to containsKey.
    Agreed, they can be made equal. I've made a DoubleMap myself, and later seen similar stuff in one or more open-source libraries. So that's an exception for Map, but not HashMap.
    Are you willing to bet your certification fee that it isn't? For that matter, are you willing to bet the certification fee of the people you're advising?
    For Map: no, I misspoke. For HashMap (from the first half of the paragraph you said is "just nonsense": yes, if there were something to be gained if I win (e.g. if you'd but against me), and if we could agree on a suitable means of resolving the issue. I suppose my assertion about future releases from Sun is unresolvable unless we were to limit the scope - say, up through JDK 1.5 when it's released?
    Obviously, I can't bet the money of others, but I am indeed willing to encourage others to bet their money against you, and suffer their ire if I'm wrong.
    I suspect that if we pursue this we'd have a hard time agreeing on an exact formulation of what we're betting on. My basic premise is this: if an engineer says that Map containsKey() is more efficient than containsValue() (in casual conversation, and note the lack of the word "always"), that's OK, they're basically right. And if another engineer immediately qualifies the statement to say that HashMap's get() is more efficient than containsValue(), that's even more right, and making such an assertion is hardly something to be avoided; it should be seen as self evident. But I'm not sure how to construct a test of who's "right" for something like this.
    [ September 15, 2003: Message edited by: Jim Yingst ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Max and Jim,
    Max:
    Now, if your point has become that using hashMap.get is faster than hashmap.containsValue, then you're undermining the original point that using a record-key mapping is faster. To wit,

    code:
    --------------------------------------------------------------------------------
    Integer recordNumber = //yadda yadda yadda
    recordNumber.equals(records.get(this))
    --------------------------------------------------------------------------------
    is just as efficient as
    code:
    --------------------------------------------------------------------------------
    Integer recordNumber = //yadda yadda yadda
    this.equals(records.get(recordNumber))
    --------------------------------------------------------------------------------
    thus, while your point regarding get is important, it's irrelevant .
    The discussion was originally about one mapping style(record-Data) being more efficient (which I think I just established isn't true), and/or the discussion was about was about the merits of containsKey vs. containsValue(which merits we can't establish either, strictly speaking :which is the only we should be speaking). AFIK, the discussion was never about TreeMaps, map.get, or alternative universes.

    Sorry Max, but you didn't establish anything with that code except that HashMap.get(aData) is as efficient than HashMap.get(anInteger). To complete that trail I would add that the first is probably more efficient than the second because of the (probable) better efficiency of Data.equals() inherited from Object ...
    But even if you established something with your code snippets, it's irrelevant with the lock() issue :
    With this line your are in the Data to recNos mapping case :
    [CODErecordNumber.equals(records.get(this))[/CODE]
    There are just two possible cases when entering in the lock() method :
  • records.get(this) is not null : it is an error condition (because a given Data may not own more than one lock at any time), or you test recNo and make lock() reentrant. But in both cases you stop here.
  • records.get(this) is null : it means nothing else than this : lock attempts may start.


  • The main difference between both mapping styles are in the while() loop surrounding the wait() :
    With the recNos to Data mapping, I could replace :

    by

    (it's just less readabale but as efficient)
    But now with the Data to recNos mapping, by what could I replace
    ?
    Where a get() could be used ?
    Best,
    Phil.
    [ September 16, 2003: Message edited by: Philippe Maquet ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Jim Yingst:
    OK, I retract my last couple statements about the purpose of Maps, and containsValue() being an afterthought. It's generally true, but yeah, there can be exceptions at the Map level.

    Ok, we're getting there

    I still stand by my assertion that for any HashMap implementation in existence, and any Sun is likely to release ever, containsKey() is (on average, for any significant value of N) faster than containsValue(). Note that I do not include classes that extend from HashMap in this statement - just HashMap's implementation. (I've been talking about HashMap's get() vs. MashMap's containsValue(), after all.)

    So, classes that extend HashMap are not HashMaps? Holy Convenient-if-Un-Object-Oriented-Qualifications Batman!

    For example, it would be easy enough to keep a reverse map in values to keys(with a few tricks: not too many), that would reduce the time of containsValue to be comparable to containsKey.
    Agreed, they can be made equal. I've made a DoubleMap myself, and later seen similar stuff in one or more open-source libraries. So that's an exception for Map, but not HashMap.

    Actually, Hashmaps might very well implement this in the future, and be 100% spec compliment, so this is an unfounded statement.

    Are you willing to bet your certification fee that it isn't? For that matter, are you willing to bet the certification fee of the people you're advising?
    For Map: no, I misspoke. For HashMap (from the first half of the paragraph you said is "just nonsense": yes, if there were something to be gained if I win (e.g. if you'd but against me), and if we could agree on a suitable means of resolving the issue.

    I don't understand what you're saying here?

    I suppose my assertion about future releases from Sun is unresolvable unless we were to limit the scope - say, up through JDK 1.5 when it's released?

    It's not unresolvable at all. My assertion is that there is no way you could know. AFIK, unless you're in on discussions I'm not, there is no way you could know . And if you don't know it to be true then, as a Engineer, you shouldn't assert it as a fact

    Obviously, I can't bet the money of others, but I am indeed willing to encourage others to bet their money against you, and suffer their ire if I'm wrong.

    Well, ok, but you can see where I might not be comfortable with that?

    I suspect that if we pursue this we'd have a hard time agreeing on an exact formulation of what we're betting on. My basic premise is this: if an engineer says that Map containsKey() is more efficient than containsValue() (in casual conversation, and note the lack of the word "always"), that's OK, they're basically right.

    in casual conversation, yes, I agree. They might assert anything, foolish or not, an casual conversation.

    And if another engineer immediately qualifies

    I would say corrects

    the statement to say that HashMap's get() is more efficient than containsValue(), that's even more right, and making such an assertion is hardly something to be avoided; it should be seen as self evident.

    Ah, but there's the rub. It's not more right in general, it's only more right in specific cases. I could(or you could) code up an implementation in about 10 minutes that has a containsValue that's just as fast as a get. To me, this just isn't a responsible statement.

    But I'm not sure how to construct a test of who's "right" for something like this.

    Well, I think you would be right, if this weren't an implementation issue. However, since that's all it is, then I'm right
    M
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118

    by

    (it's just less readabale but as efficient)
    But now with the Data to recNos mapping, by what could I replace
    ?
    Where a get() could be used ?
    Best,
    Phil.
    [ September 16, 2003: Message edited by: Philippe Maquet ][/QB]<hr></blockquote>
    I had a little trouble following your post, but I think your basic point is that the lock method would still have to use the containsValue method. While this is true, I was thinking of the unlock method. My apologies if my example wasn't sufficiently clear on this point.
    I had a little trouble following your post, but I think your basic point is that the lock method would still have to use the containsValue method. While this is true, I was thinking of the unlock method. My apologies if my example wasn't sufficiently clear on this point.
    Basically, the justification I'm hearing for the record-Data mapping style is that it's (imperceptibly) more efficient (because of the number of records in question) in one method: lock. Additionally, it requires somewhat unreadable coding practices( as you yourself asserted).
    You're favoring efficiency over clarity here, which is direct contradiction to the requirements of the exam, IMO. When I balance this against the simple elegance of Data-record mapping( in terms of how easily it clears out lost client with a WeakHashmap), I find myself unconvinced. Of course, that's just my opinion.
    [ September 16, 2003: Message edited by: Max Habibi ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Jim]: It's generally true, but yeah, there can be exceptions at the Map level.
    [Max]: Ok, we're getting there

    This isn't something new, it's what I said in my first post on the topic. Philippe's statement was not strictly true at the Map level, but we can easily shift to HashMap (or TreeMap). So OK, I briefly made mention of plain old Map again later, oops. Let's not forget what I've been saying the rest of the time.
    [Max]: So, classes that extend HashMap are not HashMaps? Holy Convenient-if-Un-Object-Oriented-Qualifications Batman!
    They would be, which is why I put in the qualifying statement. Note that my earlier comments had been about "HashMap's get()" and "HashMap's containsValue()", which rather obviously excludes any overriding methods from other classes. When I said "any HashMap implementation" that's a little more ambiguous by itself, but what I meant was consistent with what I'd said earlier - an implementation of HashMap itself, not a subclass. And so I immediately clarified the statement I'd just made to ensure it was clear what I meant.
    See Max, we really can't put all relevant qualifiers on every single sentence uttered. If I've clearly been talking about the HashMap class itself, let's try not to forget context when I omit a qualifier in one whole sentence, then reestablish it the next.
    Actually, Hashmaps might very well implement this in the future, and be 100% spec compliment, so this is an unfounded statement.
    The HashMap class itself? Are you talking about something actually planned, or a theoretical possibility? It would surprise me greatly if Sun put this in HashMap itself, as it would roughtly double the memory allocation for HashMap, for something that isn't generally required. They could well provide a DoubleMap (or BidirectionalMap or whatever they choose to call it), which might well have a couple HashMaps inside, but I stongly doubt they (Sun) would extend HashMap for this. They'd implement Map, but hide the implementation from the API. No extends. Even if they did use extends, we wouldn't be talking about HashMap's containsValue(), we'd be talking about the containsValue() of a HashMap subclass. I acknowledge that (except for the last sentence) this is just opinion on my part, not backed up by the API, and I've been careful to state it as such. All along, where appropriate.
    [Jim]: Are you willing to bet your certification fee that it isn't? For that matter, are you willing to bet the certification fee of the people you're advising?
    [Jim]: For Map: no, I misspoke. For HashMap (from the first half of the paragraph you said is "just nonsense": yes, if there were something to be gained if I win (e.g. if you'd but against me), and if we could agree on a suitable means of resolving the issue.
    [Max]: I don't understand what you're saying here?

    I'm saying I've retracted the part about Map which you objected to. Concentrating on HashMap though: I assert that HashMap's get() is more efficient than HashMap's containsValue(). If you're asking if I would be willing to be money that it's OK to make a statement like this, and if we could arrange a suitable way of determining the outcome, the answer would be yes.
    [Jim]: I suppose my assertion about future releases from Sun is unresolvable unless we were to limit the scope - say, up through JDK 1.5 when it's released?
    [Max]: It's not unresolvable at all. My assertion is that there is no way you could know. AFIK, unless you're in on discussions I'm not, there is no way you could know . And if you don't know it to be true then, as a Engineer, you shouldn't assert it as a fact

    Well, my statment about future releases was "Well for our universe, for any existing HashMap implementation (or any Sun will ever release, I'll bet), containsKey() is more efficient than containsValue()." "I bet" means that I recognize that the part about future releases isn't something we can establish as a fact or not, now. The part about future releases (the part you were evidently questioning) is not something I was asserting as fact, but it's something I feel is very probably true, and would be willing to be on. That seems pretty clear - what else does "I bet" mean? What additional qualifiers would you have liked on that statement? Now the part about "any existing HashMap implementation" - that part I was asserting as fact. Unfortunately there's some ambiguity in the phrase "HashMap implementation". Previous context, and subsequent reassertion, was that I meant the java.lang.HashMap class, as implemented in any JDK (from Sun or no). I don't know of standard terminology for this concept in Java - perhaps "proper HashMap?". Got a prefered way of saying this?
    Let's try this. I'm a programmer. I've got a program in which I've instantiated an object which I know to be a HashMap. An actual honest-to-god HashMap. I wish to make the observation that its containsKey() method is generally more efficient than its containsValue() method. Is there some way to say this simply without having to spend thousands of words justifying it later? Or should good engineers just never mention the possibility that containsKey() is faster than containsVlaue(), as it's just too complex and dangerous?
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Jim]: Obviously, I can't bet the money of others, but I am indeed willing to encourage others to bet their money against you, and suffer their ire if I'm wrong.
    [Max]: Well, ok, but you can see where I might not be comfortable with that?

    OK. But these are adults here after all.
    [Jim]: I suspect that if we pursue this we'd have a hard time agreeing on an exact formulation of what we're betting on. My basic premise is this: if an engineer says that Map containsKey() is more efficient than containsValue() (in casual conversation, and note the lack of the word "always"), that's OK, they're basically right.
    [Max]: in casual conversation, yes, I agree. They might assert anything, foolish or not, an casual conversation.

    OK, let's talk about in casual conversation in which one does not wish to be thought a fool. Did Philippe's statement make you think he was a fool? What about my statements?
    [Jim]: And if another engineer immediately qualifies
    [Max]: I would say corrects

    True. I explicitly replaced the original statement with an alternate formulation.
    [Jim]: the statement to say that HashMap's get() is more efficient than containsValue(), that's even more right, and making such an assertion is hardly something to be avoided; it should be seen as self evident.
    [Max]: Ah, but there's the rub. It's not more right in general, it's only more right in specific cases.

    Are you arguing with Philippe's statment about Map, or mine about HashMap? I have no argument about the former; I acknowledged in my first post on this topic that his statement wasn't strictly true for Map, but was true for specific implementations. I made it clear early on that I was talking about the get() and containsValue() of those implementations. Or so I thought. Is there another way I should have worded it? The specific cases I was talking about are pretty dang common. We were talking about a context where the programmer knows exactly what class he's instantiated. He may not know what platform it will run on, but he knows it's a java.util.HashMap, not a com.influxs.util.HashMap.
    [Max]: I could(or you could) code up an implementation in about 10 minutes that has a containsValue that's just as fast as a get. To me, this just isn't a responsible statement.
    But it wouldn't be HashMap's get(), or HashMap's containsValue(). Which is how I referred to those methods. I suppose you could make your own JDK by taking the reference implementation, modifying it, and releasing it as a new JDK. But you haven't alrady done that, have you? And I'm betting no one has. If they do, it's a future release. Note that I did limit my point about future releases to those that came from Sun (though I'd be happy to include some other big companies if you like). I'm well aware someone could cook up their own version if they wanted. But (a) they haven't already done so, because there was no need prior this discussion. Subsequent to this discussion, you might see a tiny need to do it, sure,and since it's pretty easy, you might do it just to prove a point. That's why I made sure to rule out that possibility. A future release from you doesn't count.
    [Max]: Well, I think you would be right, if this weren't an implementation issue. However, since that's all it is, then I'm right
    Well, it seems we have a failure to agree on what we were talking about in the first place. I thought my initial comments directing our attention to HashMap (and TreeMap) were pretty clear; guess not. Oh well. I guess my interest in most of this discussion has diminished, except for the last paragraph in my previous post. If, in the future, someone wants to make a really simple assertion assertion along the lines of "get() is more efficient than containsValue()" - is there some level of qualifiers that they can insert that would allow them to make this simple statement without it being a big issue?
    [ September 16, 2003: Message edited by: Jim Yingst ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Jim Yingst:
    [QB][Jim]: It's generally true, but yeah, there can be exceptions at the Map level.
    [Max]: Ok, we're getting there

    This isn't something new,

    Well, it's an adjustment anyway. But really, I don't care about it: the whole point of the statement was to rib you. I find it ironic, given the Goode vs. Eveile thread, that you're supporting arguments that are 'generally true'. Irony makes me laugh, laughter puts me in a good mood, and a good mood makes me tease people.

    [Max]: So, classes that extend HashMap are not HashMaps? Holy Convenient-if-Un-Object-Oriented-Qualifications Batman!
    See Max, we really can't put all relevant qualifiers on every single sentence uttered. If I've clearly been talking about the HashMap class itself, let's try not to forget context when I omit a qualifier in one whole sentence, then reestablish it the next.

    Jim, this may have been clear to you: From my perspective, it seemed like you were collectively dismissing all classes that might extend HashMap.


    Actually, Hashmaps might very well implement this in the future, and be 100% spec compliment, so this is an unfounded statement.
    The HashMap class itself? Are you talking about something actually planned, or a theoretical possibility?

    I guess all possibilities are theoretical at some point, But I mean just a possibility. At some point, as the average amount of memory increases, the tradeoff to memory to speed will be worth it, IMO.

    It would surprise me greatly if Sun put this in HashMap itself, as it would roughly double the memory allocation for HashMap, for something that isn't generally required.

    Remember, maps aren't really that big internally: all they store are references, not the actual objects, so you're talking about roughly doubling the number of references, not the actual items.
    Well, 'required' is a tough term to qualify: greater speed is always desirable. IMO, a slightly different implementation of Map that was more memory intensive but faster for containsValue is not unreasonable. More to the point, you shouldn't base your architecture around the premise that it can't happen.

    [b][Jim]: Are you willing to bet your certification fee that it isn't? For that matter, are you willing to bet the certification fee of the people you're advising?
    [Jim]: For Map: no, I misspoke. For HashMap (from the first half of the paragraph you said is "just nonsense": yes, if there were something to be gained if I win (e.g. if you'd but against me), and if we could agree on a suitable means of resolving the issue.

    I'm not butting anything against you partner: you're looking for a different kind of cowboy there
    Let's try this. I'm a programmer. I've got a program in which I've instantiated an object which I know to be a HashMap. An actual honest-to-god HashMap.

    Have you given up you wayward ways then, and joined the Forces Of Light?

    I wish to make the observation that its containsKey() method is generally more efficient than its containsValue() method. Is there some way to say this simply without having to spend thousands of words justifying it later? Or should good engineers just never mention the possibility that containsKey() is faster than containsVlaue(), as it's just too complex and dangerous?

    How about "In general, I believe that Hashmap.get() is faster then hashmap.containsValue(). I can't prove this, and would never assert it as anything more then my belief..".
    All best,
    M
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Originally posted by Jim Yingst:
    [Jim]: Obviously, I can't bet the money of others, but I am indeed willing to encourage others to bet their money against you, and suffer their ire if I'm wrong.
    [Max]: Well, ok, but you can see where I might not be comfortable with that?

    OK. But these are adults here after all.

    Did I accuse anyone of not being an adult? This seems a bit of a left field statement..


    [Jim]: I suspect that if we pursue this we'd have a hard time agreeing on an exact formulation of what we're betting on. My basic premise is this: if an engineer says that Map containsKey() is more efficient than containsValue() (in casual conversation, and note the lack of the word "always"), that's OK, they're basically right.
    [Max]: in casual conversation, yes, I agree. They might assert anything, foolish or not, an casual conversation.

    OK, let's talk about in casual conversation in which one does not wish to be thought a fool. Did Philippe's statement make you think he was a fool? What about my statements?

    This is bit of a strawman. I never said(or thought, for that matter) either of you fools: I just said that casual conversation allows foolish statements, as well as other types.

    [Jim]: And if another engineer immediately qualifies
    [Max]: I would say corrects

    True. I explicitly replaced the original statement with an alternate formulation.
    [Jim]: the statement to say that HashMap's get() is more efficient than containsValue(), that's even more right, and making such an assertion is hardly something to be avoided; it should be seen as self evident.
    [Max]: Ah, but there's the rub. It's not more right in general, it's only more right in specific cases.

    Are you arguing with Philippe's statment about Map, or mine about HashMap?

    I'm saying HashMap could change in JDK 1.4.3.

    [Max]: I could(or you could) code up an implementation in about 10 minutes that has a containsValue that's just as fast as a get. To me, this just isn't a responsible statement.
    But it wouldn't be HashMap's get(), or HashMap's containsValue(). Which is how I referred to those methods. I suppose you could make your own JDK by taking the reference implementation, modifying it, and releasing it as a new JDK. But you haven't alrady done that, have you? A future release from you doesn't count.

    Why not?


    [Max]: Well, I think you would be right, if this weren't an implementation issue. However, since that's all it is, then I'm right
    Well, it seems we have a failure to agree on what we were talking about in the first place. I thought my initial comments directing our attention to HashMap (and TreeMap) were pretty clear; guess not. Oh well. I guess my interest in most of this discussion has diminished, except for the last paragraph in my previous post. If, in the future, someone wants to make a really simple assertion assertion along the lines of "get() is more efficient than containsValue()" - is there some level of qualifiers that they can insert that would allow them to make this simple statement without it being a big issue?
    [ September 16, 2003: Message edited by: Jim Yingst ][/qb]

    Sure: just don't make me worry about you by implying that you'd actually put it in your choices document
    M
    [ September 16, 2003: Message edited by: Max Habibi ]
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Max,
    How about "In general, I believe that Hashmap.get() is faster then hashmap.containsValue(). I can't prove this, and would never assert it as anything more then my belief..".

    It's not a Belief point. As Jim said, we use SUN's JDK. I read HashMap.java sources, compared what get(), containsKey() and containsValue() do, I didn't prove that containsKey() is faster than containsValue(), I just noted that it must be, at least if the given HashMap is not empty or has a so little size() that any difference in speed couldn't be noticed.
    Best,
    Phil.
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    Hi Philippe,
    Your 'proof' is a non sequitur. If you read Jim's statement, he said

    If, in the future, someone wants to make a really simple assertion assertion along the lines of "get() is more efficient than containsValue()" - is there some level of qualifiers that they can insert that would allow them to make this simple statement without it being a big issue?

    He didn't say anything about correctness, proof, etc. He's just asking me if I'm going to force him to qualify every single statement he makes from here to eternity. I'm reply that no, I'm not. I was just concerned about this statement. Jim's a terrific and experienced developer, and one of the fairest people I've ever met.
    Correspondingly, when he makes a statement, people listen(I know I do). In this case, even though I know what he meant, I was concerned (correctly, it seems), that others could take his statements to be more then I think he meant for it to be. To that point, I'm trying to clarify my perspective of the statement. As the co-moderator of this forum, this is how I choose to balance information here.
    M
    Tony Collins
    Ranch Hand

    Joined: Jul 03, 2003
    Posts: 435
    I supose arguing about HashMaps is better than arguing about the smaller things in life like politics and religion. I might actually learn something.
    Tony
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Most of this was written earlier today, prior to some of the edits in the above posts, but I got interrupted. So now some of this is out of date, or out of sequence, but hopefully it's not too confusing. Most importantly, most of this was written before Max can out and said "the whole point of the statement was to rib you." But I'm too tired to re-edit now, so what the heck...
    [Jim]: OK. But these are adults here after all.
    [Max]: Did I accuse anyone of not being an adult?

    Nope. I'm just saying that to some extent everyone here need to use their own judgement anyway about how to apply advice they receive here from others; this is no exception. But I understand your point that we should beware that if we give bad advice, we're affecting others, not just ourselves; I agree. We just can't always agree on what's good advice.
    [Max]: This seems a bit of a left field statement..
    Hey, I'm left-handed. Are you implying theres something wrong with that? Them's fighting words!
    Oh, sorry - thought we were in MD for a second there. Nevermind...
    [Max]: This is bit of a strawman. I never said(or thought, for that matter) either of you fools: I just said that casual conversation allows foolish statements, as well as other types.
    Ok cool, just checking.
    [Max]: I'm saying HashMap could change in JDK 1.4.3.
    Are you saying you have reason to believe this is likely, or just that it's possible? If it's the latter I believe I'm covered by the "I bet" clause. If the former, I'd be very interested in hearing details.
    [Jim] A future release from you doesn't count.
    [Max]: Why not?

    Because in the statement under discussion, I specifically referred to future releases from Sun. Because (a) I knew this loophole was possible, but (b) I don't think that engineers worrying about performance issues should have to concern themselves with the possibility that their employer would decide to run their software on a custom JDK which is designed specifically to disprove this one point. If Bodgett & Scarper (or URLyBird, is that the company name?) want to do something like that, I don't think it's reaonable for me to worry much about it. While you may disagree with my motives, I did put the appropriate qualifier in ("from Sun") in the same sentence I talked about future releases. Unless JDK 1.4.3 (or even 1.5) does come out with a radically different HashMap implementation, I think I'm pretty well covered here.
    Sure: just don't make me worry about you by implying that you'd actually put it in your choices document
    Well, not unless it seemed more important to my design than it actually is here. But take FileChannel vs. RandomAccessFile. (No, I'm not trying to bring up our points of contention regarding atomiciy.) We agree that, speaking generally, FileChannels are pretty fast, and likely to be a lot faster than RandomAccessFile, right? I know that this isn't the only reason to choose them, and speed is far from paramount. But efficiency isn't a complete non-issue, as long as you don't let it override simplicity and maintainability. So, in my design doc, can I at least mention that one reason to favor FileChannel is that is (generally) really fast? As far as I know there aren't any concrete guarantees about this; the API avoids that sort of thing. I won't say "really fast" of course. Perhaps something like "FileChannel and related NIO classes often achieve signficant performance benefits compared to RandomAccessFile or streams." I mean, I could add "on most commonly-available implementations..." etc, but isn't that kind of overkill?
    I have a book here where the authors say "FileChannels' interaction with the file system is much more controllable and efficient than the previous I/O implementations." It doesn't say "usually". It doesn't say "except maybe for some rare exceptions from obscure custom-made JDKs that no one will ever use anyway". I mean, I could release a JDK in which any FileChannel read() takes at least full second longer than RandomAccessFile's read() method does. I don't belive this would actually contradict the API in any way, but perhaps I've overlooked something. Barring that possibility, I guess the statement isn't strictly true. Shall I notify the authors that their book is dangerously misleading on this point? Maybe I can get them to add: "...but don't ever think of mentioning this in your design doc."
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    And now more recently-written stuff. (Had to split posts; I ran out of smilies.)
    [Max]: I find it ironic, given the Goode vs. Eveile thread, that you're supporting arguments that are 'generally true'. Irony makes me laugh, laughter puts me in a good mood, and a good mood makes me tease people.
    Ah, I see. From my perspective the difference is: if someone says X is better than Y, and I agree that it usually is, that's cool, and I won't disagree unless the differences seem potentially relevant. Or unless it's Max. Whereas if someone says that X is Right and Y is Wrong, always and calls than absolute, then I'm going to be much more critical and argumentative.
    Jim, this may have been clear to you: From my perspective, it seemed like you were collectively dismissing all classes that might extend HashMap.
    OK, I'll endeavor to make that sort of thing more clear if it comes up in the future. I know some OO gurus use special terms to distinguish between class A itself vs. class A and its sublclasses vs. class A's sublclasses (but not A itself). I don't think Java has a standard term for it. I've generally been using something like "A's foo()" to refer to the implementation of foo() which was declared in A. I guess the need for the distinction doesn't come up that often; I can use a few extra words when it does. At least when the subject first comes up; hopefully not in every single subsequent sentence.
    I guess all possibilities are theoretical at some point, But I mean just a possibility. At some point, as the average amount of memory increases, the tradeoff to memory to speed will be worth it, IMO.
    Well possibly. But note that this also slows down the standard put() as it forces you to calculate two hashCode()s. And it also either (a) prevents you from allowing two keys to map the the same Value, since that's not reversable in a one-to-one Map, or (b) forces you to slow things down a little bit more as the reverse Map has to map each Value to a List of possible keys. And the speed benefit you talk about is to containsValue() only, not the much more commonly used methods like put() and get() and containsKey(). So I still don't think they would ever put changes like this directly in HashMap. Would be fun if we could bet on this and then, say, peek into the future ten years from now to see if it ever happened. Errr, happens. Will happen. Dang, I confused myself again...
    Remember, maps aren't really that big internally: all they store are references, not the actual objects, so you're talking about roughly doubling the number of references, not the actual items.
    True, good point. If the contents are little things like Integers then the size of the data structure with its references is significant (though me saying "double" was still a mistake, that should be an upper bound I think). If the contents are larger objects like Records or whatever, then the size of the map itself is that much less significant.
    Well, 'required' is a tough term to qualify: greater speed is always desirable.
    To me, you're talking about much greater speed for a very rarely-used method, at the cost of somewhat reduced speed for the methods that are used all the time. Not "always desireable" in this situation, IMO.
    IMO, a slightly different implementation of Map that was more memory intensive but faster for containsValue is not unreasonable. More to the point, you shouldn't base your architecture around the premise that it can't happen.
    How about basing it on the premise that it's pretty darn unlikely, and even if it does, it's not as though our architecture has become crippled, it's just not quite as fast as an alternative design which we had discarded based on a premise that turned out not to be true?
    [Jim]: Have you given up you wayward ways then, and joined the Forces Of Light?
    Couldn't tell what this was in reference to. Though for a good answer is "probably not".
    [Max]: How about "In general, I believe that Hashmap.get() is faster then hashmap.containsValue(). I can't prove this, and would never assert it as anything more then my belief..".
    Eh, too wimpy. How about "All commonly-available versions of java.lang.HashMap implement containsValue() in a method whose performance is O(n) and implement containsKey() and get() using methods that are O(1). So unless you know for sure that you're using some other class that offers O(1) performance for containsValue(), you're well-advised to stay away from containsValue() unless you really need it, or you really don't care about performance at all. And even if you do need it, you'd be much better off implementing your own Map which guarantees better performance for containsValue(), using two HashMaps internally for fast lookup in either direction. Otherwise, containsValue() just plain sucks."
    OK, that was longer than originally intended. Maybe just the first sentence or two is sufficient for most cases. But you get the idea.
    [Max]: Jim's a terrific and experienced developer, and one of the fairest people I've ever met.
    Aww, now how am I supposed to stay irritated with Max after that?
    [Tony]: I supose arguing about HashMaps is better than arguing about the smaller things in life like politics and religion. I might actually learn something.
    Plus I'd like to think that (a) we're unlikely to actually go to war over this, and (b) we have a relatively good chance of actually resolving something. Which is more than can be said for some other discussions.
    Cheers...
    [ September 16, 2003: Message edited by: Jim Yingst ]
    Max Habibi
    town drunk
    ( and author)
    Sheriff

    Joined: Jun 27, 2002
    Posts: 4118
    /*
    Originally posted by Jim Yingst:
    [Max]: I'm saying HashMap could change in JDK 1.4.3.
    Are you saying you have reason to believe this is likely, or just that it's possible? If it's the latter I believe I'm covered by the "I bet" clause. If the former, I'd be very interested in hearing details.

    As I said, it could change in jdk 1.4.3. No, I haven't spoken to anyone @ Sun about this(though I may bring it up the next time I talk to those guys, just to get a rise out of you). It's not a good idea, IMO, to design for a particular implantation of the JDK.

    Sure: just don't make me worry about you by implying that you'd actually put it in your choices document
    Well, not unless it seemed more important to my design than it actually is here. But take FileChannel vs. RandomAccessFile. (No, I'm not trying to bring up our points of contention regarding atomiciy.) We agree that, speaking generally, FileChannels are pretty fast, and likely to be a lot faster than RandomAccessFile, right? I know that this isn't the only reason to choose them, and speed is far from paramount. But efficiency isn't a complete non-issue, as long as you don't let it override simplicity and maintainability. So, in my design doc, can I at least mention that one reason to favor FileChannel is that is (generally) really fast? As far as I know there aren't any concrete guarantees about this; the API avoids that sort of thing. I won't say "really fast" of course. Perhaps something like "FileChannel and related NIO classes often achieve signficant performance benefits compared to RandomAccessFile or streams." I mean, I could add "on most commonly-available implementations..." etc, but isn't that kind of overkill?

    I actually read the above @ Sun, almost verbatim.

    I have a book here where the authors say "FileChannels' interaction with the file system is much more controllable and efficient than the previous I/O implementations." It doesn't say "usually". It doesn't say "except maybe for some rare exceptions from obscure custom-made JDKs that no one will ever use anyway". I mean, I could release a JDK in which any FileChannel read() takes at least full second longer than RandomAccessFile's read() method does. I don't belive this would actually contradict the API in any way, but perhaps I've overlooked something. Barring that possibility, I guess the statement isn't strictly true. Shall I notify the authors that their book is dangerously misleading on this point? Maybe I can get them to add: "...but don't ever think of mentioning this in your design doc."

    Please do, I'm sure the authors will that note all due attention . Regardless of the merits of point, your example is in the wrong category here. That is, there is no actual benefit to just slowing FileChannels down. There is, however, a reasonable benefit to a different HashMap implementation in order to offset it's memory usage.
    M
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: Single table / Simple Locking - WeakHashMap vs WeakReferences