Forum:

JDBC and Relational Databases

FOREIGN KEY

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's another one for you, then...

My database has a table A that maps integer tokens to some other data; the integer token is the PRIMARY KEY of this table. It then has another table B some of whose rows are integers that represent such tokens. Some of my queries then join those tables on the token. This all works fine.

I don't currently do so, but I believe that SQL would allow me to define the columns of B that are tokens as FOREIGN KEYs to table A.

What benefits (or the opposite) might come from defining those columns as FOREIGN KEYs? I am particularly interested if there's any potential performance gain.

In case it matters, the database is JavaDB (aka Derby).
[ May 04, 2007: Message edited by: Peter Chase ]

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

What benefits (or the opposite) might come from defining those columns as FOREIGN KEYs? I am particularly interested if there's any potential performance gain.

Performance is irrelevant here. What you are talking about is one of the basic core functions of a relational database: to maintain relationships between entities.

Currently as you have your data defined there is no relationship between TableA and TableB, you can perform operations on either independently of the other. Nothing will stop you deleting a row from TableA even if your application logic has inserted a row in TableB that used the id field from TableA. If you add a constraint, the integrity of your data is ensured because TableA is related to TableB, and that relationship is descibed via a foreign key constraint. It is now imposible to delete from TableA without also handling the child records in TableB.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I guess by saying performance is irrelevant, you mean that the FOREIGN KEY constraint will have no effect on performance. I hope that's what you mean, rather than that performance as a topic is irrelevant; in this part of my application, almost nothing is more relevant than performance.

My application is a rather non-typical use of a relational database. Only a very small part of the code deals with the database, and the structure of the database is very simple. Without going into details, I can say that it would become very apparent very quickly if the two tables got "out of step". I don't really need constraint enforcement to tell me that.

If adding a FOREIGN KEY constraint really does have no performance impact, and it makes my application a better user of the relational database, then perhaps I should put it in. But won't checking the constraint take some time and hence reduce performance? If it does, I'm leaving it out.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I guess by saying performance is irrelevant, you mean that the FOREIGN KEY constraint will have no effect on performance. I hope that's what you mean, rather than that performance as a topic is irrelevant; in this part of my application, almost nothing is more relevant than performance

When designing the initial data model I would argue that performance is an irrelevant concern, with the caveat that you have no stated non-functional hard target (a defined TPS value, for example)?

You have two related entities, so you should define this relationship via a constraint, otherwise why define these entities in two seperate tables? Or indeed why use a relational database at all?

If your database is embedded in your application and so not shared you can reasonably assume that the application itself will manage relationships. If the data model is potentially shared then use constraints, otherwise you are entering into a maintenance and data integrity nightmare.

As for the performance considerations of a foreign key constraint the effect could be that inserts will be slower (since the constraint needs checked) but lookups should be quicker, since the index will help that. That aside this is only going to show itself once you get into larger data sets. Basically, as will all performance considerations, there isn't a hard and fast rule that "doing whatever will make things perform better". I'd write your model to accepted data modelling rules, then load and stress test you application. If you see an issue you can remove the constraint (assuming that is the cause).

What I'm trying to convey is that you are possibly worrying about performance at the wrong point in the design.
[ May 04, 2007: Message edited by: Paul Sturrock ]

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

This database-backed code is a replacement for an earlier file-backed code. At the moment, the database-backed code is more a low-level reworking of the file-backed code, just using the database as a byte store. It is hoped to take the next step, to more like a "proper" database application, in the medium term.

I do have a non-functional target for performance, which is that the database -backed version isn't slower than the file-backed version. Unfortunately, in several circumstances, it is slower. Hence my focus on performance.

The main reason for two separate tables is to save space. If the data that was in table A (the one with the tokens as the primary key) was moved to table B, it would be massively duplicated. A lot of rows of B refer to the same row of A.

The database is fully embedded in the application and managed by a small part of the overall code. The responsibilty for ensuring tables A and B are consistent lies with that code. No other code, and certainly no arbitrary remote clients, get to change the data in the database.

I do agree, though, that as there is no definite answer to which way is faster, or whether there is any difference at all, I ought to try it and measure the performance. If a version with FOREIGN KEY is as fast or faster, then I'd probably add the FOREIGN KEY.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

This database-backed code is a replacement for an earlier file-backed code. At the moment, the database-backed code is more a low-level reworking of the file-backed code, just using the database as a byte store. It is hoped to take the next step, to more like a "proper" database application, in the medium term.

Hmm. If you don't need referential integrity do you need any other aspects of a relational database? The file system will probably always be faster for storing binary data. Are you sure of your technology choice?

Unfortunately, in several circumstances, it is slower. Hence my focus on performance.

Ah, so you can test the app. and see a performance dip. Have you isolated the bottleneck? My gut feeling guess would be the overhead of streaming binary data into the database is the more likely culprit than not using an FK.

The main reason for two separate tables is to save space. If the data that was in table A (the one with the tokens as the primary key) was moved to table B, it would be massively duplicated. A lot of rows of B refer to the same row of A.

Is space an issue? Is your dataset liable to be very big? Denormalization is a common technique to improve performance, the reason being doing two table scans is usually slower than doing one.

The database is fully embedded in the application and managed by a small part of the overall code. The responsibilty for ensuring tables A and B are consistent lies with that code. No other code, and certainly no arbitrary remote clients, get to change the data in the database.

Fair enough - if you have complete control of the data model you can assume it will remain consistant. Again though if the data model is not shared are you sure a relational database is the best fit for your requirement?

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Paul Sturrock:

Fair enough - if you have complete control of the data model you can assume it will remain consistant. Again though if the data model is not shared are you sure a relational database is the best fit for your requirement?

The current use of the relational database has some advantages over the file-backed code. The file-backed version generated thousands of tiny files, which were a pain to manage and slow to delete. Also, we will take advantage of the various tools that you get with JavaDB for backups etc.

Further intended revisions will make much better use of the relational nature. By storing less data as opaque binary BLOBs and more as specific typed columns, indexing, searching joins and all that good stuff will allow the application to become much better at searching historic data - something it is currently not great at.

If we never make that further step, the choice of relational DB technology may turn out to have been dubious.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Well, it may be too late to change. Your requirements sound like they might be better suited by the file system plus sn IR library such as lucene (since database are not good at allowing searches within binary content).

Then again, hindsight is a wonderful thing. Good luck!

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Lucene's just for text, isn't it? Much of this data isn't text and many likely searches are things like date ranges, page counts etc.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Peter Chase:
Lucene's just for text, isn't it? Much of this data isn't text and many likely searches are things like date ranges, page counts etc.

No, Lucene will handle all sorts of data (though at the low level it handles it as text). Check out their website, you can handle MS file formats, PDFs etc.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Kaydell Leavitt

Ranch Hand

Posts: 694

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I'm designing a database with referential integrity in mind.

So far, I've decided that almost all database tables will have a hidden ID field that it auto-incremented, is of type long, and can't be changed once the record is in the database.

For example, to implement the relationship between Vendors and Items -- which is many-to-many, there would be a link file, containing the hidden ID or the Item and the hidden ID of the Vendor. (I think that the only database tables that don't need a hidden ID field are the many-to-many link Tables.)

Will this design be enough? Should I ever use FOREIGN KEYs in addition to these hidden fields or instead of using these hidden fields?

Kaydell

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Something like:

You have a table, VENDOR, containing VENDOR_ID (auto ID) and columns for vendor details. Declare VENDOR_ID as primary key.

You have a table, ITEM, containing ITEM_ID (auto ID) and columns for vendor details. Delcare ITEM_ID as primary key.

You have a table, VENDORS_ITEMS, with columnns VENDOR_ID and an ITEM_ID. These columns should be declared as foreign keys to the VENDOR and ITEM tables respectively. The primary key of this table could be a compound key, comprising VENDOR_ID and ITEM_ID, or it could be another auto-ID; the former might give better performance, and it also guarantees no duplicate rows (is that what you want?).

Your application sounds like a pretty standard SQL database application, meaning you would want to use the foreign key facility to give referential integrity and better performance. My application is not at all what SQL databases were designed for, which is why it's not clear whether it's really a good idea to use foreign key in my situation.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Kaydell Leavitt:

For example, to implement the relationship between Vendors and Items -- which is many-to-many, there would be a link file, containing the hidden ID or the Item and the hidden ID of the Vendor.

This is referential integrity, though not implemented using the database. Why not just use the tools it provides? What are you looking to gain by not using foreign key constraints?

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Kaydell Leavitt

Ranch Hand

Posts: 694

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

With ON UPDATE CASCADE, I think that one issue is that records that are external to your organization can't be changed. Records like invoices and purchase orders can't be changed because they are legal records.

Records like inter-store transfers have line-items like invoices and purchase orders but in this case, the records are internal and there is less of a reason to care about itemIDs changing.

I'm thinking that another example where ON UPDATE CASCADE may be OK is the relationship between Items and TaxCodes. In this case, the tax code is also on every invoice line-item record, but it doesn't print on the printed invoice, so it's more internal. In this case ON UPDATE CASCADE may work. Is the time it takes to do the cascaded update proportional to the number of records to be updated? Or is a FOREIGN KEY essentially the same thing as the hidden ID concept. That is, do the Item records all have a hidden ID that identifies the TaxCode record, allowing me to change the taxCode and automatically having all of the Item records still relate to the same TaxCode record?

Kaydell

Peter Chase

Ranch Hand

Posts: 1970

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

FOREIGN KEY is a constraint, not a column.

It says that the value in column X of table A is a key into column Y in table B. Column X and column Y must have compatible types, or the constraint cannot be created.

Any row inserted into table A must have a value in column X that matches an existing value in column Y of table B; if not, the constraint causes the insert to fail.

Any row deleted from table B must have a value in column Y that does NOT match any existing value in column Y of table B; if not, the constraint causes the delete to fail.

Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.

Paul Sturrock

Bartender

Posts: 10336

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

With ON UPDATE CASCADE, I think that one issue is that records that are external to your organization can't be changed. Records like invoices and purchase orders can't be changed because they are legal records.

I don't follow this. What do you mean "external to your organization"? Outside your database? If you don't want child records to be changed by an update to their parent, don't use ON UPDATE CASCADE.

Is the time it takes to do the cascaded update proportional to the number of records to be updated?

Most probably.

Or is a FOREIGN KEY essentially the same thing as the hidden ID concept.

Conceptually yes. You've basically given your entities a surrogate key, however the way you suggest implementing referential integrity is non-standard.

JavaRanch FAQ HowToAskQuestionsOnJavaRanch

Kaydell Leavitt

Ranch Hand

Posts: 694

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

With ON UPDATE CASCADE, I think that one issue is that records that are external to your organization can't be changed. Records like invoices and purchase orders can't be changed because they are legal records.

I don't follow this. What do you mean "external to your organization"? Outside your database? If you don't want child records to be changed by an update to their parent, don't use ON UPDATE CASCADE.

What I means is that a CPA, who was my boss, told me that some types of database are legal records and can't be changed. He said that the law trumps database-design theory. He said that invoices and purchase orders can't be changed. He said that the records in your database have to match the invoices that your customer received from you and that the purchase orders that your vendors received from you must match the records that you have in your database.

So, assume I won't use ON UPDATE CASCADE for itemIDs. Does this mean that the user can't change their itemIDs?

Kaydell
[ May 23, 2007: Message edited by: Kaydell Leavitt ]

Kaydell Leavitt

Ranch Hand

Posts: 694

I like...

posted 16 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

FOREIGN KEY is a constraint, not a column.

Thanks. I didn't understand that before.

It says that the value in column X of table A is a key into column Y in table B. Column X and column Y must have compatible types, or the constraint cannot be created.

So, is Table Y the parent of Table X? When using foreign keys, is there always a parent-child relationship?

Can you always use foreign keys for all types of relationships: a) one-to-one, b) one-to-many, c) many-to-one, and d) many-to-many?

What about many-to-many relationships where a table is related to itself?

Kaydell

Don't get me started about those stupid light bulbs.