aspose file tools*
The moose likes Java in General and the fly likes Key comparison Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Key comparison" Watch "Key comparison" New topic
Author

Key comparison

Arjunkumar Shastry
Ranch Hand

Joined: Feb 28, 2005
Posts: 986
We have two tables.Source table generally contains 100 to 200 records.Destination table has records more than 3,00000.For trasnferring the records which need to be pushed to destination table,stored procedure has been used.This procedure confirms whether <customer name> does not exist in destination table prior to push.Somehow,it may happen that,<Customer name> does not exist but primary key of another customer in destination table matches with primary key of record which needs to be pushed.To avoid this,we need to check whether source table primary keys are not there in destination table.What is the efficient method for comparison.?
That is,in one table we have 100 numbers,in another table 300000 numbers.Now linear searching for checking existance of key will be too expensive.



Namma Suvarna Karnataka
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
The customer name and primary key columns need to be indexed columns. Query for records in target table where first name or primary key are equal to the source record. Searching indexed columns is extremely efficient: search will not be linear it will be binary.
Arjunkumar Shastry
Ranch Hand

Joined: Feb 28, 2005
Posts: 986
Thanks
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Side note: don't join when you do your search:

select x, y from source, target where source.name = ....

Not too bad when indexed and one of the tables is small, but you will be better off looping over the source table and then comparing the values in parameterized queries:

select count(*) from target where target.name = ? or target.key = ?

If this is familiar, great; if not, it's an invaluable tool to add to your arsenal and worth a little digging.
Rick Portugal
Ranch Hand

Joined: Dec 17, 2002
Posts: 243
The best thing to do is to let your relational database management system handle it for you. Create a unique index where you want to prevent duplicate records from being inserted. Build referential intergity into your database by creating constraints between the two tables. Relational database vendors have gone to great lengths to optimize that so you don't have to.


IBM 286, SCJP, SCWCD, EIEIO
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
>> The best thing to do is to let your relational database management system handle it for you.

The DBM cannot handle it for you, at best it can refuse to handle it because you've violated referential integrity.

>> Create a unique index where you want to prevent duplicate records from being inserted.

The primary keys are already unique by definition. That leaves the customer name and if it's not unique he can't stipulate that it's going to be unique unless he controls the schema. Furthermore, that does not solve the problem. What will happen is an exception will throw. You do not use exceptions for flow control; you check first.

>> Build referential intergity into your database by creating constraints between the two tables

He's probably got a staging table for transferring updates and besides, that does not solve the problem. You check efficiently, you add records, simple. You are making way too many assumptions IMO.

>> Relational database vendors have gone to great lengths to optimize that so you don't have to.

How is does a constraint that represents a programming error if violated represent optimization? That advise is not particularly helpful and overly general.
[ August 19, 2005: Message edited by: Rick O'Shay ]
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Arjunkumar, if asked to guess, I would say you have a target production table and a staging table containing updates. There's a good chance that neither table is modifiable by you or your group, although indexing the name would be a non-destructive enhancement they should add if not already there.

The primary key is already indexed. Referrential integrity would break the system as the source table is defined to have keys that do not exist in the target table. This is typical of staging tables for updates. You probably cannot blindly update and catch a database exception as a processing switch, even if it weren't a horribly bad idea. Even of none of those issues is correct, the advise I provided will work well for your stated problem and in general is an excellent approach.
Arjunkumar Shastry
Ranch Hand

Joined: Feb 28, 2005
Posts: 986
Thanks.
Yes,we have staging table and production table.We have The stored procedure to make the updates and inserts.Now sometimes,users can directly enter the records in production from another source.Our stored procedure which inserts the record fails to run when it finds same primary key(or indexing fields)in target prodction table.
Yes,indexing on 2 fields have been done.
I think its better to print the message inside stored procedure itself to find which record is causing the problem.
Rick Portugal
Ranch Hand

Joined: Dec 17, 2002
Posts: 243
Originally posted by Rick O'Shay:
The DBM cannot handle it for you, at best it can refuse to handle it because you've violated referential integrity.
That's actually the function of referential integrity. If you violate referential integrity the row is rejected. That is very different from "refusing to handle it" since that is the desired behavior. Rejecting the row IS the proper handling.
Originally posted by Rick O'Shay:
How is does a constraint that represents a programming error if violated represent optimization? That advise is not particularly helpful and overly general.
Um, constraints do not "represent a programming error". A constraint is a declarative way to define a business rule within the database. Setting referential integrity, primary keys, and unique indexes once in your database solves the problem forevermore. You need to handle SQLExceptions anyway. That is the preferred solution. Too many people rewrite the logic that is already a part of their RDBMS. It's not called a "DBM", by the way.
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
>> If you violate referential integrity the row is rejected.

Yes, the row is rejected and if you're in a transaction context it is rolled back, further complicating matters if you are trying to implement business logic using exceptions. Even goto is better than that.

>> Um, constraints do not "represent a programming error".

Who said they did? Violating a constraint is an error, not a case in a switch statement.

Never use exceptions for flow control. Item #2 under best practices.

http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html?page=last

http://www.javapractices.com/Topic19.cjp

No worries, you're here to learn and hopefully you picked up another useful piece of information that will improve the quality of your code foing forward.
[ August 20, 2005: Message edited by: Rick O'Shay ]
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Yes, rick, it's called a DBM. Not all DBMS are relational and there's no point in being specific when the general will do. As for being the preferred method, that's not supported by any information intuitive or actual.

I can tell you that when millions of records are involved in an system with exclusive access, a common practice is to disable constraints in production. I've worked with several large system designers and fully have of them don't use constraints and prefer programmatic checks. Why? Because you need to do that anyway. They know, as any skilled practionner knows, that business logic should not be located in a catch clause.
Rick Portugal
Ranch Hand

Joined: Dec 17, 2002
Posts: 243
Originally posted by Rick O'Shay:
Yes, rick, it's called a DBM. Not all DBMS are relational and there's no point in being specific when the general will do.
Wow. DBMS is an acronym for Database Management System. A DBMS can be relational, hierarchical, etc. There is no reference to "relational" in the term DBMS. A Relational Database Management System is actually called an RDBMS. Guess what the R stands for.

Neither is called a "DBM".
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531

A textbook case of a disruptive technology

... Most indicative of its mantle as a true disruptive technology, MySQL proved that many of the missing high-end features weren't as indispensable as people used to claim. For instance, referential integrity (jeez, who could be opposed to integrity?) wasn't required in a database when it could be achieved in the application code, often more reliably. You could also achieve efficient locking without row-level locks; in fact, supporting row-level locks took so much overhead that the application was almost better without them.

http://www.oreillynet.com/pub/wlg/4715




Well, I always design in and enable constraints and disable only for data loads or where performance is a serious problem. In all cases, it's backed by programmatic checks. If there's a serious bottleneck, there are ways to avoid that and, anyway, exception catching is not high performance. It may be inside Oracle, but I'm not writing PL/SQL for the most part.
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Well, you must really have a low opinion of me to actually start spelling out acronyms that have been around for 30 years. It's a message board, Rick, and if you see DB or DBM assume it's reasonable short form for the topic at hand. I'm sure my dog knows what an RDBMS is and what the R stands for, who doesn't.

Do me a favor, do some research on best practices with exceptions and whether, specifically, you should use catch clauses in SQL exceptions to implement the business logic. That's what we're discussing, not what SQL or RDBMS means. See, we all know that already, Rick. If it's something you just learned and felt compelled to share, then accept my apology.
Rick Portugal
Ranch Hand

Joined: Dec 17, 2002
Posts: 243
Originally posted by Rick O'Shay:
The DBM cannot handle it for you

Originally posted by Rick Portugal:
It's not called a "DBM", by the way.

Originally posted by Rick O'Shay:
Yes, rick, it's called a DBM.

Originally posted by Rick O'Shay:
That's what we're discussing, not what SQL or RDBMS means. See, we all know that already

It depends on what you mean by "we". If you are including yourself in "we" then you are clearly incorrect. All you have to do is to read the thread to see that.
Originally posted by Rick O'Shay:
Well, you must really have a low opinion of me to actually start spelling out acronyms that have been around for 30 years.

Draw your own conclusions.
[ August 20, 2005: Message edited by: Rick Portugal ]
Rick O'Shay
Ranch Hand

Joined: Sep 19, 2004
Posts: 531
Yeah, um... don't use exceptions to implement business logic in your application. Somehow I think that advice got lost in the din of TLA sidebars.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Key comparison
 
Similar Threads
How to Determine Whether to Insert or Update
oracle record fetch
When does the hibernate query commit?
Hibernate supports for PK
entity bean for a table with No Primary key : Please Help,,,, URGENT