This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
I have decided on caching my records into an ArrayList at start up. Basically, I read the data file, and make Contractor objects (I'm doing B&S). I then insert the contracter objects into the arraylist using a loop for the length of the data file. Here are my questions, I've thought a lot about it, and I want to make sure my justifications sound reasonable to you guys. For my implemented method public String read(int recNO) I decided to read from my ArrayList instead of the actual text file. I did this to reduce the chance of errors of the db file being corrupted or an issue with reading the bytes (and it is quicker and easier to traverse through an ArrayList). Here are some questions: when updating a record, I think it would be easier to update the contracter object contained in the arraylist first, and use that new data to update the database file next. It would seem silly to update the database, empty out the arraylist, and then reinsert the records all over again. (Then again, I could use .set(int index, Object element) but my first idea seems easier and more efficient.) It seems like an unncessary, added step doing it the second way. What do you think? Thanks for help!
[ November 20, 2004: Message edited by: Daniel Simpson ] [ November 20, 2004: Message edited by: Daniel Simpson ]
I also chose to use cacheing. In my case I read in all records including those that are deleted. I also use the same record to store the records locked status. The reason I chose to read in deleted records was because a locked record can be deleted, and the create function needs to traverse through the file(in my case cache) to fine available record numbers so it is convinient to read in deleted records. [ November 20, 2004: Message edited by: Inuka Vincit ]
Hey Inuka, I read more closely in my documentation and I totally understand now why I need to include the deleted records. Because twice it says that they can be reused and also overwritten by a newly created record. (Gotta look closely at those method comments..heheh.) So I have edited my post, and left it at my first question to whether or not my solution is okay and won't cause any problems if I update my cache before the actual data file. [ November 20, 2004: Message edited by: Daniel Simpson ]
Hi, Daniel. About updating the cache first, before the data file... don't do it Once upon a time, there will be a case when the cache has been updated, and before it is written out to IO, the IO fails, and the cache has to be reverted. It is much more convenient to have it the other way around.
I also chose caching in my project. I used null for deleted records in the ArrayList. However, in the create method, instead of going ArrayList.indexOf(null), which is the easy way, I would suggest looking for a deleted record which is unlocked (the first way one could find a locked deleted record.) This is something you will come against as you program further... it will make your life easier to find an unlocked record... you will see when you code the method what I mean.
In terms of updating the cache, you would use set(int index, Object o) to replace a record in cache with a new one. If adding a completely new record, you would use add.
I don't think updating the cache first will be failproof for the data; could lose you some marks. [ November 20, 2004: Message edited by: Anton Golovin ]
Anton Golovin (firstname.lastname@example.org) SCJP, SCJD, SCBCD, SCWCD, OCEJWSD, SCEA/OCMJEA [JEE certs from Sun/Oracle]
Joined: Sep 02, 2004
Thanks, Anton! Your scenario makes perfect sense. I am going to use the ArrayList method, set(int index, Object element). So it will just overwrite that updated record at the specified record number. I'm changing my solution to that instead. Thanks. Also when you said:
I also chose caching in my project. I used null for deleted records in the ArrayList. However, in the create method, instead of going ArrayList.indexOf(null), which is the easy way, I would suggest looking for a deleted record which is unlocked (the first way one could find a locked deleted record.) This is something you will come against as you program further...
How would go about finding out whether a record is locked or not (keep in mind I have not gotten to the locking part yet, nor have I payed a whole lot of attention on the lock and unlock methods in my DB interface, so bear with me.) [ November 20, 2004: Message edited by: Daniel Simpson ]
First of all, the B&S assignment doesn't mention anything about performance, neither does it mention anything about very large record sizes.
Keeping this in mind, I was also tented to create a caching mechanism. Allthough there is no good reason to do it for. But I was suspecting that if the performance is better, that would be good.
Now, after giving it some thoughts, I realized that implementing a foolproof caching system would be complex and consume lots of time. You have to find a decent solutions to address these problems:
- How many records are you going to cache at once ? The whole file ? What if the file goes very big ? If you don't tend to cache the whole file, which algorithm are you going to use ?
- What about updates ? if you update the cache first and then the file, you will have to create a transaction with your client. Supose the JVM crashes before its written to the file and after it has been written to the cache.
And I think there will be several other issues on the way..
Anyway, if you are going to implement the cache for increasing performance (which isn't stated in the assignment) you can also not ignore the fact that the database file can become very big, so that will the main problem with the cache. Its wrong to justify the cache implementation but not to mention the chance of a big databasefile....
Joined: Jul 02, 2004
Hi, Daniel. As you code your locking mechanism, you will probably use a data structure to hold record numbers and cookie values; then, by checking this structure, you will be able to determine if a record is locked or not...
1. DB can become very large, larger than the memory can hold. But, also note this: the cache used in this project is a mimic of paging behavior like those in a sophisticated db system. Caching whole file will elimate the need of implementation of the complex paging subsystem and provide a reasonable performance. If you are talking about large data, actually there are some other problems in the project which can not handle really large database.
2. Updating is a problem but I don't see your argument convince me. No matter cache or not, the disk io may fail. It is not specific to use of cache. And, one of the argument/solution is how do we handle it when io fails. a. exit the program: result in the same for both cache or non-cache. b. implement a rollback, complicated for io and write to disk in cache but rollback just the cached data structure is easy so almost the same for either method.
But my questioning to update is this: what would happen in this situation: two programs run on separate jvm and both use the same data file, eg. client a is standalone, client b is a networked connect to localhost on the same machine with server using the data file as the client a is using. if using cache, client a won't be noted the database change when client b changed it. and verse vice. This is problem I am having now when I am testing my code.
But in my document, it says:
You may assume that at nay moment, at most one program is accessing the database file; therefore your locking system only needs to be concerned with multiple concurrent clients of your server.
Therefore, I guess I may not need to worry about the separate jvm issue as mentioned above. But in the case of one local one rmi clients, may I consider them separate jvm? Anton: do you have some suggestion? [ November 28, 2004: Message edited by: Andy Zhu ]
I dont think the markers or examiners is going to test ur SCJD assignment with a very big data file with lots of records, they will just use the same data file u submitted with the assignment to ensure it will be working properly. And from my memory the data file only has less than 50 records i think it was around 30 to 40 records.
Joined: Sep 24, 2004
But my questioning to update is this: what would happen in this situation: two programs run on separate jvm and both use the same data file, eg. client a is standalone, client b is a networked connect to localhost on the same machine with server using the data file as the client a is using. if using cache, client a won't be noted the database change when client b changed it. and verse vice. This is problem I am having now when I am testing my code
Well, thats really not an issue like you said. Because the only way to solve this 'properply' is to implement some kind of 'progressive locking' on physical level. You would create another column in the file that indicates if a record is locked or not. So, within the own jvm a read/write occurs atommical. Meaning that, you cannot write if you are reading and vica versa.
So you are safe for phantom reads. When you want read record A and another user (using the same rmi server, thus the database runs in the same vm) want s to update A then you're gonna get record A before or after the update (there is a not on this , read **) . But at least you're not going to get a mixture of the update.
Using the progressive locking you simulate this for different jvm's. When you update a record you first write the 'locked' flag. next you update it and then 'unlock' it. Doing this the other jvm will not read as long as you are updating (and no one will update if you are reading). So again, you will receive the record before or after the update
** Speaking from a logical view its just of matter where you draw the line. For instance:
You read record A, directly afterwards someone updates record A. You created an object which contain the record information but its not yet displayed...is the object out of synch with the db ? yes.
You read record A, directly afterwards someone updates record A. You created an object which contain the record information but its allready displayed, so theoretically the data changed after displaying was the object out of synch with the db ? no.
In the first case you could re-read the record. In the second case not. However, I think that the first case is the same as the second, reading from a database is a 'snapshot'. You cannot exepect that no one changes data that you have 'read' . You can however do certain checks if you want to update the data afterwards (you can check the consistence, and if it doens't match, throw an error and re-read the record)
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com