It's not a secret anymore!
The moose likes JDBC and Relational Databases and the fly likes Concurrency issue for dimensional tables Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "Concurrency issue for dimensional tables" Watch "Concurrency issue for dimensional tables" New topic

Concurrency issue for dimensional tables

Jayesh A Lalwani
Saloon Keeper

Joined: Jan 17, 2008
Posts: 2746

Not sure, if this is the correct place for this question

We have an OLAP database with a typical star shaped schema. It has several dimension tables and some fact tables. Dimension tables have values for the dimensions, and fact tables have FKs to the dimension tables. The values in the dimension tables have to be unique, otherwise the reports look messy

Now, we have this grid distributed application that does some massive computation and puts data into the OLAP tables. The input data gets split into batches, and the batches are sent to workers on the grid. The workers basically do some processing and then insert data into the fact tables. One of the challenges is the Dimension tables aren't prepopulated. So, if a worker sees an output that requires a new record in the dimension table, it has to insert into the dimension table itself.. AND the records in the dimension table should be unique.

Each worker basically does this (assuming for simplicity's sake that there is only one column in the fact table.. and only one dimension

When we insert into dimension table we do this query

When we insert into fact table, we do this

Each batch is executed in it's own transaction.

Again, this is a simple example. Actually, there are several dimension tables, and we insert into each dimension table if required, and do one insert into the fact table that joins all the dimension tables

Ok, so the problem is that when 2 differrent threads are concurrently executing batches that produce Output with the same value. Both threads insert a record into the DImension table. The changes are isolated because we haven't committed the transaction yet. Then both threads insert a record into fact table.. first record in fact table has the FK to first record in dimension table.. second record in fact table has a FK to the second record. Then both commit, which results in duplicate records in the dimension table. Now, a third batch starts up. This third batch doesn't insert any records in dimension table,. but inserts 2 records into fact table! Actually there are other constraints on the fact table that prevent us from have dup records, so third batch fails

So, my main problem is how do I do this concurrently without adding (too many) bottlenecks

I agree. Here's the link:
subject: Concurrency issue for dimensional tables
It's not a secret anymore!