Why should I avoid NULLs in my database?
Joe Celko said it best: "NULLs confuse people..." (SQL For Smarties, ISBN 1558605762). McGoveran and Date add: "NULLs...are far more trouble than they are worth and should be avoided; they display very strange and inconsistent behavior and can be a rich source of error and confusion." (Guide to Sybase and SQL Server, ISBN 020155710X).
My sentiments exactly. Of course, I don't expect to convince you by flashing a few quotes from reputable authors. Let's talk for a minute about what exactly NULLs do that cause this type of reaction. The first problem is that the definition of NULL is "unknown." So, one problem is determining whether one value is (not) equal to another value, when one or both values are NULL. This trickles down to many problems for a database engine and any associated applications. The following list details some of those problems:
Why the difference in count results? You would *think* that the count would give an identical rowcount regardless of the contents of the column. It is often recommended that "*" be avoided because it is inefficient (causing an extra call to the syscolumns table) -- however in this case, if you allow NULL values in your columns, you run the risk of basing your count on a column which contains NULLs... leading to an inaccurate count. Thankfully, the engine is much more efficient in processing SELECT COUNT(*) than it is processing SELECT *.
Here is another more involved example. Let's say you're running a stats program, and someone has to enter things manually. What if they don't know the adid and/or siteid when they enter the data, and you're performing rollups against it? If you haven't used it before, WITH ROLLUP groups by your GROUP BY columns, then adds summary rows. It adds flags to each column when you're at a summary row, so that you can identify WHICH summary row it is. Guess what the flag is? NULL. So, try out this code:
You'll see that the results clearly identify the summary rows with NULL flags. Unfortunately, if you have NULLs *in the data*, this becomes very difficult to process automatically. For example, run this now:
See the difference? Which rows are the summary rows now? Easy enough to figure out, if you have a small result set and time to straighten out the mess. However, if you've got a system that automatically (or on-demand) creates reports against a data warehouse, I think you can see how NULL values will put up some roadblocks.
Sometimes NULL values are unavoidable—because the information simply doesn't exist, or it is not known yet. Two such examples: birthdates and middle initials. Occasionally you will come across scenarios where you have incomplete information from a user, and therefore you can choose to either wait for it or decide that the information isn't necessary to complete the data—in which case, NULLs should be allowed. (For a person without a middle initial, you could avoid NULLs by using an empty string (''). Hopefully, birthdate isn't a part of your primary key, and this will allow you to store NULL until you get the additional information.)
My suggestion is to use a default value in every case that makes sense, and declare all such columns explicitly as NOT NULL. The default in DDL for column creation, at least in SQL Server, is NULL if you don't specify. The following two statements create an identical pair of tables:
However, the default behavior can change between servers, databases and even between executions in the same session, so play it safe and ALWAYS explicitly declare NULL or NOT NULL.
Related ArticlesCan I fix this mm/dd/yyyy <-> dd/mm/yyyy confusion once and for all?
Could I get some help with JOINs?
How can I tell which version of MDAC I'm running?
How do I access MIN, MAX, SUM, COUNT values from SQL statements?
How do I change column order in a table structure?
How do I change the order of columns in a table?
How do I concatenate strings from a column into a single row?
How do I convert columns of values into a single list?
How do I determine if a database exists?
How do I document / compare my SQL Server database(s)?
How do I get the IDENTITY / AUTONUMBER value for the row I inserted?
How do I solve 'ADO Could Not Find The Specified Provider'?
Should I use BETWEEN in my database queries?
Why can't I use the * wildcard in a database search?
Why do I get 'Syntax Error in INSERT INTO Statement' with Access?
Why do I get weird results when using both AND and OR in a query?
Why do some SQL strings have an 'N' prefix?
Why does AbsolutePosition return as -1?
Why doesn't SQL Server allow me to separate DATE and TIME?
Why is Query Analyzer only returning 255 characters?
How do I deal with an apostrophe (') in a SQL statement?