GeeCON Prague 2014*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Making the Data Class a Singleton & Thread Safe Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Making the Data Class a Singleton & Thread Safe" Watch "Making the Data Class a Singleton & Thread Safe" New topic
Author

Making the Data Class a Singleton & Thread Safe

Sean Keane
Ranch Hand

Joined: Nov 03, 2010
Posts: 581

A common technique that people seem to follow is to make their Data class a Singleton and store the records in a cache inside of the Data class.

I have followed this approach too. But now I think my implementation is fundamentally flawed. I've put a thread up on the Threads forum to see what the guys there think. Maybe someone here who has followed this approach may have some input too?

The basic reason why I think it is flawed is as followed.

  • 1) Your CRUD methods which are specified in the DB interface will be implemented as instance methods in the Data class.
  • 2) Your getInstance() method that you will need in the Data class to get a handle on the Singleton instance will be a static method.


  • Now, if you make your cache an instance variable of the Data class, then you have your static getInstance() accessing a non-static variable.

    So your CRUD methods will be using the lock associated with the instance of the Data class, whereas the getInstance() method will be using the lock associated with the Class itself. Therefore two threads can simultanously be executing the getInstance() method and a CRUD method. This could leave the cache in an invalid state.

    The reason the cache could be left in an invalid state in my implementation is because I have two getInstance() methods:

  • public synchronized static DBExtension getInstance(final String databaseLocation)
  • public synchronized static DBExtension getInstance()


  • The first one that takes the databaseLocation parameter will initialise the cache and return you the Singleton instance. The second one will return the Singleton instance without initialisation the cache - it will also throw an Exception if the cache is not initialised.

    So it is possible in my implementation for Thread-A to be executing one of the CRUD methods to be suspended, and in the meantime Thread-B could call getInstance(databaseLocation) causing the cache to be reloaded with possible different content...then Thread-A begins executing again unaware that the contents of the cache has changed underneath it.

    Is this design in general fundamentally flawed? Or it is just my implementation that is flawed, and there is a way to implement this design so that it is guaranteed to be 100% thread safe?


    SCJP (1.4 | 5.0), OCJP (6.0), OCMJD
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Thinking about this some more. I'm guessing what other people have done is they have two methods like follows in their Data class:

    * public synchronized static DB getInstance()
    * public synchronized DB init()

    So they will return a uninitialised instance of the Data class from the getInstance() method - i.e. the cache is not loaded and any CRUD operations will fail.

    They then use the instance method init() to initialised the instance of the Data class - i.e. load the records from the file into the cache.

    This seems quite messy. I don't like the idea of returning an uninitialised instance of the Data class. But I'm guessing this is what other people are doing and they have passed successfully.

    I still wonder though is it possible to come up with a thread safe solution where I can use pass the database location into my getInstance() method, load the cache, and return an initialised instance of the Data class.

    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    i don't know exactly how you implemented your class, but i have a get instance that return a new instance
    and everything run in this (single) instance..
    of course getinstance should always return the one and same instance..


    SCJP 5, SCWCD 5, SCBCD 5
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Jonathan Elkharrat wrote:i don't know exactly how you implemented your class, but i have a get instance that return a new instance
    and everything run in this (single) instance..


    Are you using a cache in your Data class? If so, would I be correct in guessing that when you call getInstance() for the very first time it will return you an instance of your data class where the cache is not loaded?
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    i'm loading the cache in the constructor (there's not an empty constructor)
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Your Data class is a singleton right? So how does your private constructor get the location of the database file?
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    at first i hardcoded it but now i'm using a getInstance with parameter.

    you can also do this:
    getInstance(){
    return getInstance("db-1x1.db");
    }
    getInstance(String dbName){
    ,,,
    }
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Ok, well hard-coding is obviously a non-runner . Now you basically have the same set-up as I described in my original post.

    So how did you get around the problems I am discussing? i.e. you have a static getInstance() method and non-static CRUD methods all operating on a shared variable (the cache), so depending on your implementation it is possible for your cache to be corrupted. Also, depending on how you have set-up your code it's possible your implementation is not thread-safe.
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    there's no reason for the cache to get corrupted.
    either if you synchronize on this or on specific lock, as long as you have only one instance
    of the DAO you'll have only one instance of lock and only one operation at a time...

    why do you think it'll cause a problem??
    Sean Keane wrote: So your CRUD methods will be using the lock associated with the instance of the Data class, whereas the getInstance() method will be using the lock associated with the Class itself. Therefore two threads can simultanously be executing the getInstance() method and a CRUD method. This could leave the cache in an invalid state.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Jonathan Elkharrat wrote:there's no reason for the cache to get corrupted.
    either if you synchronize on this or on specific lock, as long as you have only one instance
    of the DAO you'll have only one instance of lock and only one operation at a time...

    why do you think it'll cause a problem??


    Your static getInstance() method will use the Class lock. Your instance methods (read(), update(), etc.) will use the lock on the object. So it is possible for two separate threads to operate on your cache at one time as I explained in my original post. Also take a look at the post that I linked to in my original post.
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    my getInstance doesn't operate on the cache at all.
    (it indeed create one but just the first time called, which mean there's not an instance
    of DAO anywhere anyway)

    your getInstance(String) is problematic.
    it always init a database...

    (to be honest i just look at the code, didn't read the whole topic)
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Ok, so is your code something like this:
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    was, to be honest. since i added the parameter, i created a factory for the DAO.

    but your client doesn't (and shouldn't) know if the DAO is already initialized.
    if you hardcode the dbName then it would look like this:


    but i think the spec want the location to be specified by the user so try to find a fix
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:So it is possible in my implementation for Thread-A to be executing one of the CRUD methods to be suspended, and in the meantime Thread-B could call getInstance(databaseLocation) causing the cache to be reloaded with possible different content...then Thread-A begins executing again unaware that the contents of the cache has changed underneath it.

    That's true (and I already mentioned that in the other thread ). And it's even scarier than that, because class and instance locks are not blocking each other, so thread-B can be swapped out in the middle of the initializing process (with same database file) and then Thread-A starts again. Ouch!
    But why would you allow your single instance to be initialised more than once That's a bit contradictory with having just 1 single instance. You have only one instance but it could be loaded with different data.

    Sean Keane wrote:
    So they will return a uninitialised instance of the Data class from the getInstance() method - i.e. the cache is not loaded and any CRUD operations will fail.

    They then use the instance method init() to initialised the instance of the Data class - i.e. load the records from the file into the cache.

    This seems quite messy. I don't like the idea of returning an uninitialised instance of the Data class.

    It might be messy, but it doesn't violate the principle Each individual method should have high cohesion. The getInstance method returns an instance, the init-method initializes this instance. The name of the getInstance with dbLocation-parameter should actually be getAndInitializeInstance. Of course the init-method has to handle the same issue (do not initialize the instance more than once), but it has the benefit of being an instance method and thus other methods are blocked as long as this method is executing.


    SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
    http://www.javaroe.be/
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Jonathan Elkharrat wrote:if you hardcode the dbName then it would look like this:

    If you want to pass the certification without resubmitting I would not hard-code the dbName
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Reading back over my original post. I was trying to figure out why I thought I had a problem here - my original code example is posted on this thread here. I was confused why I had a problem, because the code I posted in response to Jonathan is actually what I want!

  • The code appears to be thread-safe.
  • It appears like it is not possible for the static and non-static methods to operate on the Map at the same time.
  • You always get a Singleton instance back that has the cache loaded.

  • So using the code below I can now write:

    It's blatantly obvious to me now what my original problem was. I was allowing the cache to be reloaded:

    When I could have prevented it simply with this implementation here:

    But my worry with this approach is that it is not thread safe because I am lazy-loading the Singleton instance. I think my fear of it not being thread-safe comes from examples that you see about double-check locking AND lazy-loading.

    However, my example code here is not using double-check locking. So I cautiously confident that the code is thread-safe. What do you guys think?
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel De Nijs wrote:
    Sean Keane wrote:So they will return a uninitialised instance of the Data class from the getInstance() method - i.e. the cache is not loaded and any CRUD operations will fail.

    They then use the instance method init() to initialised the instance of the Data class - i.e. load the records from the file into the cache.

    This seems quite messy. I don't like the idea of returning an uninitialised instance of the Data class.

    It might be messy, but it doesn't violate the principle Each individual method should have high cohesion. The getInstance method returns an instance, the init-method initializes this instance. The name of the getInstance with dbLocation-parameter should actually be getAndInitializeInstance. Of course the init-method has to handle the same issue (do not initialize the instance more than once), but it has the benefit of being an instance method and thus other methods are blocked as long as this method is executing.


    Design is always a balance and trade-off. So I'm not disagreeing with you here - as there is no 100% right answer. But you can always pull out another design philosophy from your briefcase to make it look like a proposed solution is incorrect.

    To me the following points are more important than cohesion:

    1) The API is intuitive
    - (i.e. documenting the behaviour in JavaDoc does not constitute making non-intuitive code intuitive in my eyes )
    2) The class has an invariant which is preserved
    - (my class invariant is that the cache must be loaded - at all times this must be true for an instance of my Data class to maintain correctness)
    3) The API is designed in such a way that it is easy for a developer to use and reduce the risk of errors or exceptions being thrown

    I feel these points are met by passing the database location into the getInstance() method and returning an instance of the Data class that has it's cache loaded.

    This solution also has an additional benefit over initialising the cache separately. If you initialise the cache separately you have to check if the cache has been loaded in every single method that operates on the cache. Whereas with my solution this check is isolated to one single point - I'm sure that design idiom has some nice name that I can't think of right now :p

    So all in all I think there are more benefits to returning an initialised instance of the Data class from the getInstance() method than there are from the other solution of returning an uninitialised instance of the Data class.

    The key point here for me is the class invariant. With both solutions we are saying that the class invariant is "the cache must be loaded in order to use an instance of this class". In my solution, this class invariant exists and is maintained from the moment the instance is created. But for the solution where an uninitialised instance is returned from the getInstance() method - the class invariant is not maintained. That is why you have messy code in ever single instance method.

    Apart from cohesion I see no other benefits from the approach of returning an uninitialised instance of the Data class from the getInstance() method - do you guys?
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:So I cautiously confident that the code is thread-safe. What do you guys think?

    It's indeed thread-safe, but due to the synchronized keyword it's an "expensive" multi-threaded solution. I know performance is not an issue in this assignment (but that's why there is something like the double-checked locking idiom).
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    Roel De Nijs wrote:
    Jonathan Elkharrat wrote:if you hardcode the dbName then it would look like this:

    If you want to pass the certification without resubmitting I would not hard-code the dbName


    as i said before, i moved the singleton and parameter (db name) to a factory class..
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:1) The API is intuitive
    - (i.e. documenting the behaviour in JavaDoc does not constitute making non-intuitive code intuitive in my eyes )

    Why would your 2 getInstance-method be more intuitive? Let me try to think like a junior programmer: I learnt at school about the singleton design pattern and like in every text book the getInstance method has no parameter at all, so I'll use that method to get my instance. I have no clue why you would need another getInstance method (even with a parameter ) to get a simple instance.
    And to be clear: I don't say my alternative is more intuitive than yours. I just say that I need to read your API too to know how I'm expected to use it (just like you should do when you use someone else's library/api)

    Sean Keane wrote:That is why you have messy code in ever single instance method.

    You are entitled to do a whole lot, but don't call my code messy You may say it's more code and harder to maintain or add an extra nstance method (because you have to think about adding this check), but my code is clear and simple, not messy.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel, out of my three points I listed you missed the one I said was the most important . The API being intuitive is subjective. But the most important point is the class invariant.

    When I mention "your solution" here Roel, I don't mean yours personally. It's just shorter than typing an explanation of the entire solution. So apologies if I caused any offence.

    Setting up the state of an object after it is created and checking that the correct state exists before carrying out operations on the object is a very common approach . Same with calling the code messy, I'm not calling your code messy, it's the general design of checking the cache is loaded in every single method is messy in my eyes.

    Class Invariant

    The cache being loaded is an invariant of your class. An instance of your class will not operate without the cache being loaded. Yet your getInstance() method returns an instance of the class where the class invariant is not upheld. This seems like bad design to me.

    In every single method you are basically checking that the class invariant is upheld. Which is fine. But not necessary if you code it in such a way that the class invariant can never be broken - that is what the solution I am proposing does. It ensures the class invariant always exists and is never broken. This provides correctness in the program and avoids the possibility of errors occurring (like in your solution if an instance method is called on an uninitialised instance of the Data class).

    What Are The Benefits?

    As I mentioned in my previous post, I think there are more benefits to returning an initialised instance of the Data class from the getInstance() method than there are from the other solution of returning an uninitialised instance of the Data class.

    So, apart from cohesion I see no other benefits from the approach of returning an uninitialised instance of the Data class from the getInstance() method - do you guys?
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:Roel, out of my three points I listed you missed the one I said was the most important The API being intuitive is subjective. But the most important point is the class invariant.

    I didn't miss that one, I agree with the explanation of the class variant to use the 2 getInstance-method approach (maybe I had to mention this too in my previous post). But if it's the most important point, you had to put that one on the 1st place (and maybe even leave out the API being intuitive because that's a subjective one).

    Sean Keane wrote:So apologies if I caused any offence.

    No worries, I'm not offended
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel De Nijs wrote:
    Sean Keane wrote:Roel, out of my three points I listed you missed the one I said was the most important The API being intuitive is subjective. But the most important point is the class invariant.

    I didn't miss that one, I agree with the explanation of the class variant to use the 2 getInstance-method approach (maybe I had to mention this too in my previous post). But if it's the most important point, you had to put that one on the 1st place (and maybe even leave out the API being intuitive because that's a subjective one).


    Ha, I had actually thought about re-ordering the points to put the class invariant one first, but thought that was pretty clear from "The key point here for me is the class invariant" . But I think the API being intuitive would definitely seem to be a personal thing, so probably little to be gained from discussion on that.

    The main point of the class invariant argument is that your instance of the Data class always upholds the class invariant. Using a getInstance-method that takes the database location as an argument allows me to construct an instance of the Data class that upholds this class invariant.

    Whereas the solution of returning an uninitialised instance of the Data class from the getInstance-method, where you have to initialised it later, does not uphold the class invariant.

    The only positive I have seen about this second approach is the one you mentioned about cohesion - which I agree is important. But would you not agree that program correctness (class invariant etc.) is of more importance and benefit in the grander scheme of things. Thus is a more favourable solution to go with?

    Or if you were implementing this again would you still go with returning an uninitialised instance of the Data class, and if so, why ?
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    what about this solution?

    when use start the server he'll get a popup with a text input
    for a file name and a checkBox "use Default" that disable
    this text input and according to user choice one of the methods
    will get called:

    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Jonathan Elkharrat wrote:what about this solution?

    And how does the user know what's the default? And where should he/she put the default file, so your Data class will be able to read it?
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    if he received the application "as is" and don't want to mess with the parameter he'll use default.

    on the othe hand, if he know what's he doing he can change some parameters, like the db location..

    (it's like specifying a port. either you use the default that came with the program or you want to tweak
    it a little and then you specify your own parameter)
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Jonathan Elkharrat wrote:if he received the application "as is" and don't want to mess with the parameter he'll use default.

    That still doesn't answer my question: where should he/she put the database file, so your Data class will be able to read it?
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    Roel De Nijs wrote:
    Jonathan Elkharrat wrote:if he received the application "as is" and don't want to mess with the parameter he'll use default.

    That still doesn't answer my question: where should he/she put the database file, so your Data class will be able to read it?


    in the same folder, of course.
    when running ajar file any filename specified is looked in the current directory, i think...
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Jonathan Elkharrat wrote:in the same folder, of course.
    when running ajar file any filename specified is looked in the current directory, i think...

    I don't know that for sure. Anyhow I would not follow that approach, but if you decide to add this functionality you have to clearly mention it in your userguide (which name and location is expected for daabase file)
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    i really think it's the expected behavior.
    i'm wondering what happen if you run the appliction like that, does it read from the
    database bundled inside the jar? because it cannot write on it at shutdown.
    never thought of that..
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:that was pretty clear from "The key point here for me is the class invariant"

    I answered so much questions/threads the past 12-24 hours that I missed that one

    Sean Keane wrote:Or if you were implementing this again would you still go with returning an uninitialised instance of the Data class, and if so, why ?

    I didn't think about the class invariant when developing this the 1st time. It would certainly make me doubt more between these alternatives. I have done some extra reading on the "class invariant" and one of the characteristics is "Each operation on a class can assume it will find the invariant true on entry and must leave the invariant true on exit." and is your exit-method (when you write everything back to the file) not breaking the invariance. I know my exit-method would, because if you call this-method you can again not call any other method than the init-method.
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Jonathan Elkharrat wrote:i'm wondering what happen if you run the appliction like that, does it read from the database bundled inside the jar?

    The database file is not intended to be in the runme.jar and I don't have to worry about this issue at all, because the user simply selects the database file he wants to use and the Data class takes care of it. I don't use a hard-coded setting at all in my application.
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    oh, sorry, my mistake..

    there are 2 Jar files, one inside the other.
    anyway, when extracting the outer one you get a JAR file and a db in the same directory.

    (when a program throw you an error "couldn't find XXX" where's the first place you'll put
    that file? the instinct says in the same directory... )
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel De Nijs wrote:I have done some extra reading on the "class invariant" and one of the characteristics is "Each operation on a class can assume it will find the invariant true on entry and must leave the invariant true on exit." and is your exit-method (when you write everything back to the file) not breaking the invariance. I know my exit-method would, because if you call this-method you can again not call any other method than the init-method.


    What is your exit-method doing beyond writing the contents of the cache back to the file that only allows the init-method to be called afterwards? Why would you modify the cache in your exit method?

    Regards breaking the class invariant. You could look at this in two ways. When you call the exit-method the assumption would be that the program is exiting and it no longer matters that the class invariant is not upheld.

    Or, if you weren't comfortable designing an API with this assumption, then you could argue that the exit-method should not modify your cache - it should simply persist the contents of the cache to the file. In this setup, the exit-method should be named something else, a name that reflects what it is doing e.g. flush(), persist() or something similar.

    I like the second approach as it seems safer and more correct when it comes to program correctness.
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5300
        
      13

    Sean Keane wrote:What is your exit-method doing beyond writing the contents of the cache back to the file that only allows the init-method to be called afterwards? Why would you modify the cache in your exit method?

    Like the name indicates it's the method you call when you don't need the dao anymore. Adding an extra flush()-method would be a nice idea
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel De Nijs wrote:
    Sean Keane wrote:What is your exit-method doing beyond writing the contents of the cache back to the file that only allows the init-method to be called afterwards? Why would you modify the cache in your exit method?

    Like the name indicates it's the method you call when you don't need the dao anymore.


    Yes, but the way your API is set up, another person could have access to the instance of the Singleton, and would thus be left with a instance of the Data class that is in an invalid state. So it is not really an exit-method in the sense that the you can't guarantee that no one else has a handle on the Singleton and using it.

    Of course you can document the behaviour. But you can say that about anything - you can document any behaviour and say "well, you should have read the documentation" .
    Jonathan Elkharrat
    Ranch Hand

    Joined: Dec 31, 2008
    Posts: 170

    the server should be up and running for the eternity.
    it's like when you're surfing and the web team shut down the server for maintenance,
    your cart list is not valid anymore and the web page cannot do anything...
     
    GeeCON Prague 2014
     
    subject: Making the Data Class a Singleton & Thread Safe