File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Best way to insert a record without duplicating Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Best way to insert a record without duplicating" Watch "Best way to insert a record without duplicating" New topic
Author

Best way to insert a record without duplicating

Emanuele Mazzante
Ranch Hand

Joined: Jan 14, 2014
Posts: 50
    
    1
Hi,

I'm moving the first steps with JDBC and Derby and I'm trying to make some small program to enter and retrieve data in a database.

I've a People table with Name, Last_Name, Birth_Date, Nationality and Person_ID (which is the Primary Key auto generated incremented by 1).

The user fills the fields on a form and after the checks I would add those values ​only if not duplicated (I don't want 2 identical records, except of course the primary key).

Normally I would execute a select query on the db and if no record are returned, I insert the data. In my small test program may work, but I don't think that this method is suitable in a project which there are many connected users. Between the search and the insertion can be created the same record type.

So can you please tell me what is the best way to make input without duplicates? Thank you for the attention.
Abhay Agarwal
Ranch Hand

Joined: Feb 29, 2008
Posts: 1212
    
    1

I assume your uniqueness criteria is that combination of "Name, Last_Name, Birth_Date, Nationality" fields should be unique.
One way is already what you have mentioned - do select , compare and then insert if required .. but that shall be done at Code end (not at JDBC end).

One option -
Change DB table structure .. we can define composite key so make all fields Name, Last_Name, Birth_Date, Nationality and Person ID are defined as composite key. So now, DB shall itself check whether this combination is unique or not befor adding / updating any other records or row in table. Again this approach do not sound good as we have made composite key which contains every column of table. Not good.

Next option (and best I think ) is Stored procedure.
Rather directly inserting record in Db table, call Stored procedure. Inside Stored procedure, place you can place your uniqueness condition and then decide whether to insert new row or not.
Other way is to Triggers which can also help in securing uniqueness of records.


Ron McLeod
Ranch Hand

Joined: Feb 12, 2013
Posts: 329
    
    6

Abhay Agarwal wrote:Again this approach do not sound good as we have made composite key which contains every column of table. Not good.

I don't understand why this is bad. A unique constraint based on the combination of fields which need to be unique seems simple and effective.
Marshall Blythe
Ranch Hand

Joined: Feb 26, 2013
Posts: 32
Emanuele Mazzante wrote:
Normally I would execute a select query on the db and if no record are returned, I insert the data. In my small test program may work, but I don't think that this method is suitable in a project which there are many connected users. Between the search and the insertion can be created the same record type.


You're right to be concerned. The "select-before-insert" pattern sets up a classic race condition, and it should be avoided. Besides, duplicates are probably going to be fairly rare in reality, and the initial query would be a waste of resources. Just attempt the insert and let the database tell you if it failed. Make the database work for you: define a unique constraint on the desired columns and let the database reject any attempt to insert duplicates. Whenever an insert attempt causes a unique constraint violation a SQLException will be thrown with a vendor-specific error code, and you can catch and inspect this in your application. When the error code indicates that a unique constraint violation has occurred you can display a user-friendly error message to the user. Otherwise the SQLException can be re-thrown or treated as a fault condition as appropriate for your application.
K. Tsang
Bartender

Joined: Sep 13, 2007
Posts: 2584
    
    9

Given the person table has a auto-increment key (person_id) then you can check for uniqueness in your code. For example, in the Person class have the equals() and hashcode() methods using the other 4 attributes. If the equals() method returns true, then there is a duplicate. However, to do this in code, you will need to loop all records in table. If doing in the SQL side, you can use a where clause to check if the count is 1, if so then duplicate.

Regarding composite key, it can be very effective indeed. But is it necessary to use all fields/columns. If all columns are used as composite key, then that person_id column is useless.


K. Tsang JavaRanch SCJP5 SCJD/OCM-JD OCPJP7 OCPWCD5 OCPBCD5
E Armitage
Rancher

Joined: Mar 17, 2012
Posts: 892
    
    9
K. Tsang wrote:Given the person table has a auto-increment key (person_id) then you can check for uniqueness in your code. For example, in the Person class have the equals() and hashcode() methods using the other 4 attributes. If the equals() method returns true, then there is a duplicate. However, to do this in code, you will need to loop all records in table. If doing in the SQL side, you can use a where clause to check if the count is 1, if so then duplicate.

The problem with doing it in code is that multiple threads running at the same time can both (correctly) detect that it's not going to be a duplicate but they would then fail at the insert part. Better just update and handle the exception when a duplicate is detected.
Emanuele Mazzante
Ranch Hand

Joined: Jan 14, 2014
Posts: 50
    
    1
First of all thank you very much for your time and for your help. I really appreciate it.

What I want to avoid is exactly the race condition. In my sample program will never happen but I want to think about how to solve this problem in a multiuser (or multithreading) environment.

Apparently the solutions you kindly suggested are:

)- Composite primary key: I can remove the auto increment Person_ID field and make a composite field with all the remaining fields. In this case I can catch duplicate SQL exception and also inform the user about it. But this way is considered a good design? May be too redundant to have a field that relates the values ​​of other four fields? May adversely affect the performance?

)- Stored procedure and trigger: I have not yet studied it, however, I do not know if this allows me to inform the user of the non-inclusion because of the attempted duplication.

)- It could be possible to make something like this: INSERT INTO People (field1, field2,....) VALUES (value1, value2,....) WHERE NOT EXIST (SELECT * FROM People WHERE field1 = value1 AND field2 = value2 AND....) or I said a very stupid thing?

What is the best design pattern?
Thank you so much for your attention.


Marshall Blythe
Ranch Hand

Joined: Feb 26, 2013
Posts: 32
Emanuele Mazzante wrote:
)- Composite primary key: I can remove the auto increment Person_ID field and make a composite field with all the remaining fields. In this case I can catch duplicate SQL exception and also inform the user about it. But this way is considered a good design? May be too redundant to have a field that relates the values ​​of other four fields? May adversely affect the performance?


At this point I'd be more concerned about simplicity and correctness than performance. I've used this technique many times with with no adverse performance effects. True, there is some overhead in the database to maintain the index for the unique constraint, but this is what databases are designed to do. I think you'll find that the performance impact of performing a pre-select is greater than that for the index maintenance. At any rate, you won't know if you have a performance concern until you put a solution into effect and measure the results. Just define the constraint on the minimum subset of columns required to define uniqueness, and let the database do the rest.

Emanuele Mazzante wrote:
)- Stored procedure and trigger: I have not yet studied it, however, I do not know if this allows me to inform the user of the non-inclusion because of the attempted duplication.
)- It could be possible to make something like this: INSERT INTO People (field1, field2,....) VALUES (value1, value2,....) WHERE NOT EXIST (SELECT * FROM People WHERE field1 = value1 AND field2 = value2 AND....) or I said a very stupid thing?


I think these just shift the potential for a race condition from your Java app to the database because they both ultimately rely on a query being executed BEFORE the insert.
Maxim Karvonen
Ranch Hand

Joined: Jun 14, 2013
Posts: 104
    
  11
Emanuele Mazzante wrote:
)- Composite primary key: I can remove the auto increment Person_ID field and make a composite field with all the remaining fields. In this case I can catch duplicate SQL exception and also inform the user about it. But this way is considered a good design? May be too redundant to have a field that relates the values ​​of other four fields? May adversely affect the performance?

Not a composite field. Unique index on several fields. It is somewhat similar to "composite field", but is maintained by the DB.

Command to create index looks like

There should be a similar syntax to define multicolumn unique contraint for a CREATE TABLE statement. Just check docs for your DB. It can be a primary key. Something like
Emanuele Mazzante
Ranch Hand

Joined: Jan 14, 2014
Posts: 50
    
    1
Thank you for taking the time to consider my problem.

Marshall Blythe wrote:
Emanuele Mazzante wrote:
)- Stored procedure and trigger: I have not yet studied it, however, I do not know if this allows me to inform the user of the non-inclusion because of the attempted duplication.
)- It could be possible to make something like this: INSERT INTO People (field1, field2,....) VALUES (value1, value2,....) WHERE NOT EXIST (SELECT * FROM People WHERE field1 = value1 AND field2 = value2 AND....) or I said a very stupid thing?


I think these just shift the potential for a race condition from your Java app to the database because they both ultimately rely on a query being executed BEFORE the insert.


Great explanation and observation. I did not believe that these solutions have a time window for a race condition.

Maxim Karvonen wrote:
Not a composite field. Unique index on several fields. It is somewhat similar to "composite field", but is maintained by the DB.


Thanks for telling specified. Now I have the right information to search inside the database documentation.
Abhay Agarwal
Ranch Hand

Joined: Feb 29, 2008
Posts: 1212
    
    1

Just one question ... here it is specified that Unique keys can be used instead of composite key. Does unique key keeps a combination of all fields unique OR uniqueness is maintained per field ?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18886
    
    8

Abhay Agarwal wrote:Does unique key keeps a combination of all fields unique OR uniqueness is maintained per field ?


No, it's the combination of the fields which is unique. For example if "FirstName" and "LastName" comprised the unique keys, then the table could contain many records with LastName = 'Clapham' and many records with FirstName = 'Paul', but only one record where both of those were true.
Abhay Agarwal
Ranch Hand

Joined: Feb 29, 2008
Posts: 1212
    
    1

Thanks for the reply. Understood.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1846
    
  16

As others have already said, you should use a unique key constraint over the combination of columns you want to be unique. The database will implement this constraint as a unique index on those columns. The unique index is then enforced by the database itself, not be any application code i.e. you cannot insert duplicate records via any means - Java code, SQL, other applications etc. This is transaction-safe and means you don't have to worry about "dirty reads" etc, and you can rely on the database to make sure your records are unique across those columns without having to write any extra code at all.

You can have one primary key constraint on a table, and the PK column(s) must be unique and must have a value. This means your existing auto-increment ID can and should still be the PK on your table. It's very useful to have a unique numeric ID as your PK (this is sometimes called a "surrogate key"), because the value can be generated automatically from the database and is guaranteed to be unique (if you use an auto-increment or DB sequence).

Also, if you create child records in another table, the children would normally have a foreign key pointing back to the PK on the parent record. The FK columns are the same as those in the PK, so it's easier to have a single numeric ID as your PK, because you only need to include one column for the corresponding FK in the child table, rather than copying a combination of columns.

Meanwhile, if your business rules change so that the set of unique columns is different, you don't want to have to change your primary key, so again it's easier to keep the numeric ID as your PK because this is independent of the business rules.

You can have as many unique key constraints on your table as you need, for each unique combination of columns that you want to enforce. However, as mentioned above, these take up space in your indexes, so if there is no business need to enforce a particular unique combination of columns, you might decide not to do so.

If you use a combination of unique business data columns as your PK, there is a risk that one of the data values might change at some point, which is difficult to implement if it's part of the PK, but is not a problem if the modified column is just part of a unique key and your PK is a separate numeric value that never changes.

Under no circumstances should you try to enforce uniqueness via application code alone, because you will find it very difficult to catch all the edge cases e.g. where a transaction might have dirty reads on a multi-user system with lots of caching and competing threads etc. Relational databases provide robust and powerful tools to support unique/primary key constraints and indexing, so use them properly.


No more Blub for me, thank you, Vicar.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Best way to insert a record without duplicating