This week's book giveaway is in the OCMJEA forum. We're giving away four copies of OCM Java EE 6 Enterprise Architect Exam Guide and have Paul Allen & Joseph Bambara on-line! See this thread for details.
Does the book (or you) have any position on allowing null column values when designing database tables? There always seems to be a difference of opinion whether they should be allowed in the database or whether the columns should be set to a default value.
At my shop, our DB guideline is to allow nulls only on columns defines as date data types since a default date of all zeros is not allowed in ANSI SQL. All other non-date data type columns require a default. It sure makes queries a lot easier to code when you don't have to worry about nulls, but once in a great while I run into a situation where the default value of 0 in a numeric field is a problem since it could also represent valid data.
What is your opinion on this issue?
Thank you kindly [ October 16, 2007: Message edited by: Charles McGuire ]
Originally posted by Bear Bibeault: Lynn, can you explain why? I've often heard this viewpoint and have no reason to doubt its wisdom, but it'd be nice to know the logic behind the wisdom.
And also, what do you do when there's a difference between a default value and data simply not being provided? I believe an example was given that you could use 0 as a default value for a numeric field, but what happens if that default value could be a valid bit of data. Then, do you end up with code constantly checking for various "default values" to see if a field has been supplied, or not? With null, you at least know that there's nothing there - you don't have to worry about if what's there is the default or something else.
Perhaps I haven't phrased that well, but I would like to know more about your reasoning on being generally "anti-null". I'm no database expert so I probably couldn't argue either way, but I'd like to hear your stance on it.
Sure, I'll expand on that. I will agree there are instances when using NULL is justifiable, but in general I avoid them. As you suggest, if 0 is a viable value for a column you can't very well use it for a default. That being said:
My biggest problem with using NULL is that it means "I have no value." You can't compare one NULL to anonther, making it impossible to compare two NULL values in a column.
I think allowing NULLs also leads to lazy data entry. I'm all for rigidly enforcing values when there is only a finite set that can fit into a field. It's not necessarily that *you* as the database designer would enter bad data, but once your little table goes out into the wild, you need to put as many constraints on it as reasonably possible. People can and will abuse your database if you let them.
Joined: Jan 18, 2005
I'd also like to hear Lynn's reasoning, but from my perspective.
Null values totally screw up queries. In our DBMS you can't test if something is equal to null.
That is not a valid statement. Something can't be equal to null. I can write it the following way:
That will work, but now I have a completely different operator. Some query tools also puke on nulls. What happens when I map those elements to a program? I have a different test, I can't assign one field to another if the field is null. It will blow up. That means more compound conditional statements.
Life is much easier without nulls, IMO.
Joined: Jan 18, 2005
Oops... Lynn replied while I was composing. Didn't mean to step on you.
Joined: Dec 20, 2001
Originally posted by Lynn Beighley: I think allowing NULLs also leads to lazy data entry. I'm all for rigidly enforcing values when there is only a finite set that can fit into a field. It's not necessarily that *you* as the database designer would enter bad data, but once your little table goes out into the wild, you need to put as many constraints on it as reasonably possible. People can and will abuse your database if you let them.
Okay, so let me toss out an example and tell me what you think of it and how you might handle this.
I'm working on an application in which physicians will be entering reviews of various hospitalizations. Quite honestly, doing the entire review process can take an hour or more (not because the data entry is particularly overwhelming, but because they have to look into multiple other systems to find the information they're after).
Now, near the bottom of that page is a drop down field which is populated with options that come from a database table. It's a required field, but, due to the length of time in which it may take to complete a review, the physicians can't always be expected to complete a review from start to finish without being interrupted and having to leave. To that end, the physicians can "Save" their work at any time to the database so that they can come back to it later.
Now, let's say a physician fills out half the information and leave the drop-down list in question blank. That person has to leave so they do a quick save and away they go. What would you populate into the database field for that drop down selection? In my application, it remains NULL until the user actually selects something.
I agree that NULL values can make life harder (and I have written code just to handle nulls, which I don't care for), but I don't see a cleaner option in this case. Is there a cleaner option that I'm simply not seeing?
In this case, I don't see the need for a NULL. There are a handful of valid values that the dropdown can take, choosing another one to serve as "unanswered" (even if it's a blank string) doesn't seem like an issue.
Where I think issues come up more is with numeric fields where setting aside an otherwise valid "magic number" to mean "empty" uses up that value. Similiarly with date values. [ October 16, 2007: Message edited by: Bear Bibeault ]
Joined: Jan 18, 2005
With date values, at least those that are represented by SQL date data types, you really have no choice. You have to use NULL, unless you want to use 0001/01/01 which really is repugnant.
For char/varchar/string types, the no-null mantra is really much easier. Empty set is preferable, or in the doctor case above, something like "unanswered".
Numerics are a bit tougher. One application had a numeric field for temperature setting in a refer container. Since they were shipping dry goods, the refer unit was not to be turned on. The default for a numeric data type is zero, and that's what they thought the temperature for the refer to be because they were depending on that field for the temperature and an indicator for refer setting. By using the one field, they accidentally froze dry goods and ruined the whole load.
The fix for that was to have a second column to indicate the requirement (Dry/Refer). The applications looked to that flag for guidance, not the temperature. It sounds like a hassle, but it was no more work than allowing nulls 'cuz it had to be tested either way.
In other words, in the rare cases where data is missing or that zero cannot represent missing, I've found it better to indicate that condition with something more explicit than NULL.
Originally posted by Charles McGuire: I've found it better to indicate that condition with something more explicit than NULL.
Very good point. I think all too often coders rely on implicit information (this applies to values outside of DBs as well) rather than more explicit indicators. (My most horrifying example: "I want to make a conditional decision based on the name of the JSP".... ye gods!)
Of course, if you're not careful you could trip up normalization...
If only there were a book that covered this sort of thing. :p Lynn? [ October 16, 2007: Message edited by: Bear Bibeault ]
Corey, That's an interesting scenario. My first instinct is to allow the nulls and have a status field to show whether the row is completed. Then I started thinking about whether I really have an "in progress" table and a "completed" table where the "in progress" allows nulls and the "completed" doesn't. Does this sound more complicated than necessary.