This problem in not directly related to multiple threads, But I could not find any other appropriate place to put it. Please feel free to move the thread if there is any other relavent forum for it.
I have a third party product with its own database. My adaptor application is inserting records in one table of this database. The Primary Key is generated by database itself, but there is a column named external_key , wherein we are storing adaptor application generated primary key. Also if two request with same external_key are sent, 1st one is to be taken as Insert and second one is to be taken as update. This logic is also written in the adaptor application.
This was working fine, but then we deployed this solution on a clustered enrironment. Now if two requests with the same external_key come in , and both goes to different cluster nodes, we get two records in the database with same external_key, as both of them first check if the record exists in DB, don't find it and creates an insert in the database.
How to deal with such scenarios in a clustered environment? Basically how concurrent things run in clustered environment?
a) Pessimistic:- Lock the table explicitly for all updates before you insert/update the record. The first process will lock, perform the insert. The second process will wait on the lock. When the first process commits, it will release the lock and second process will go and update the row. Here, you want to make sure that your transaction is short lived, since processes will block each other. This solution is very simple to implement although it can cause bottlenecks
b) Optimistic:- Put a unique constraint on the column, do no locking. The second process will obviously fail on commit. It should just catch the exception and retry the process. Second time it will update the record.
c) Dirty records:- Allow duplicate records with the same external_key. Execute an end of the day process that runs through and finds these "dirty" records and merges them together. although not ideal, this solution is the one that performs much better than the other 2 and might make your code that inserts data less complicated. It might cause the component that reads the data complicated since it will have to contend with duplicate external_keys
Another way this might work is if you allow Dirty reads(TRANSACTION_READ_UNCOMMITTED). This will work only if the database supports Dirty reads. Note Oracle doesn't support Read Uncommitted. I haven't tried this out, so I'm not sure what all the repercussions will be
subject: Concurrent execution in clustered environment.