//  home   //  advanced search   //  news   //  categories   //  sql build chart   //  downloads   //  statistics
ASP FAQ Tutorials

   8000XXXX Errors
   ASP.NET 2.0
   Classic ASP 1.0
      Access DB & ADO
      General SQL Server & Access Articles
      Other Articles
      Schema Tutorials
      Sql Server 2000
      Sql Server 2005
   General Concepts
   Search Engine Optimization (SEO)

Contact Us
Site Map



ASP FAQ Tutorials :: Databases :: General SQL Server & Access Articles :: Why should I avoid NULLs in my database?

Why should I avoid NULLs in my database?

Joe Celko said it best: "NULLs confuse people..." (SQL For Smarties, ISBN 1558605762). McGoveran and Date add: "NULLs...are far more trouble than they are worth and should be avoided; they display very strange and inconsistent behavior and can be a rich source of error and confusion." (Guide to Sybase and SQL Server, ISBN 020155710X). 
My sentiments exactly. Of course, I don't expect to convince you by flashing a few quotes from reputable authors. Let's talk for a minute about what exactly NULLs do that cause this type of reaction. The first problem is that the definition of NULL is "unknown." So, one problem is determining whether one value is (not) equal to another value, when one or both values are NULL. This trickles down to many problems for a database engine and any associated applications. The following list details some of those problems: 
  • they are interpreted differently depending on compatibility level and ANSI settings; 
      For example, let's consider two values, x and y, that are both NULL. Since the definition of NULL is unknown, then you can't say x = y. However, with the ANSI setting ANSI_NULLs, this can be different. When this setting is FALSE, x = y ... however, when TRUE, x <> y. Confusing, no?
  • the storage engine has to do extra processing for each row to determine if the NULLable column is in fact NULL -- this extra bit can have performance implications;
  • they produce weird results when using calculations, comparisons, sorting and grouping;
  • they create problems with aggregates and joins, such as different answers for COUNT(*) vs. COUNT(column_name);
  • they produce unpredictable results in statistics computations, particularly WITH ROLLUP and WITH CUBE;
  • applications must add extra logic to handle inserting and retrieving results, which may or may not include NULL values;
  • they cause unpredictable results with NOT EXISTS and NOT IN subqueries (working backwards, SQL determines that NULL columns belong or do not belong to the result set, usually for the wrong reasons);
  • no language that supports embedded SQL has native support for NULL SQL values.
Here are some more specific examples: 
SELECT COUNT(state) FROM publishers 
SELECT COUNT(pub_name) FROM publishers
Why the difference in count results? You would *think* that the count would give an identical rowcount regardless of the contents of the column. It is often recommended that "*" be avoided because it is inefficient (causing an extra call to the syscolumns table) -- however in this case, if you allow NULL values in your columns, you run the risk of basing your count on a column which contains NULLs... leading to an inaccurate count. Thankfully, the engine is much more efficient in processing SELECT COUNT(*) than it is processing SELECT *. 
Here is another more involved example. Let's say you're running a stats program, and someone has to enter things manually. What if they don't know the adid and/or siteid when they enter the data, and you're performing rollups against it? If you haven't used it before, WITH ROLLUP groups by your GROUP BY columns, then adds summary rows. It adds flags to each column when you're at a summary row, so that you can identify WHICH summary row it is. Guess what the flag is? NULL. So, try out this code: 
USE pubs 

    adid INT, 
    siteid INT, 
    hitcount INT 

INSERT fakeStats(adid,siteid,hitcount) SELECT 1,1,40 
INSERT fakeStats(adid,siteid,hitcount) SELECT 1,1,20 
INSERT fakeStats(adid,siteid,hitcount) SELECT 1,2,30 
INSERT fakeStats(adid,siteid,hitcount) SELECT 1,3,40 
INSERT fakeStats(adid,siteid,hitcount) SELECT 2,1,40 
INSERT fakeStats(adid,siteid,hitcount) SELECT 2,2,60 
INSERT fakeStats(adid,siteid,hitcount) SELECT 2,2,20 
INSERT fakeStats(adid,siteid,hitcount) SELECT 2,2,30 
INSERT fakeStats(adid,siteid,hitcount) SELECT 2,3,10 
SELECT adid,siteid,SUM(hitcount) 
    FROM fakeStats 
    GROUP BY adid,siteid 
You'll see that the results clearly identify the summary rows with NULL flags. Unfortunately, if you have NULLs *in the data*, this becomes very difficult to process automatically. For example, run this now: 
UPDATE fakeStats SET adid   = NULL WHERE id = 4 
UPDATE fakeStats SET siteid = NULL WHERE id = 6 
SELECT adid,siteid,SUM(hitcount) 
    FROM fakeStats 
    GROUP BY adid,siteid 
See the difference? Which rows are the summary rows now? Easy enough to figure out, if you have a small result set and time to straighten out the mess. However, if you've got a system that automatically (or on-demand) creates reports against a data warehouse, I think you can see how NULL values will put up some roadblocks. 
Sometimes NULL values are unavoidable—because the information simply doesn't exist, or it is not known yet. Two such examples: birthdates and middle initials. Occasionally you will come across scenarios where you have incomplete information from a user, and therefore you can choose to either wait for it or decide that the information isn't necessary to complete the data—in which case, NULLs should be allowed. (For a person without a middle initial, you could avoid NULLs by using an empty string (''). Hopefully, birthdate isn't a part of your primary key, and this will allow you to store NULL until you get the additional information.) 
My suggestion is to use a default value in every case that makes sense, and declare all such columns explicitly as NOT NULL. The default in DDL for column creation, at least in SQL Server, is NULL if you don't specify. The following two statements create an identical pair of tables: 

    id INT 


    id INT NULL 
However, the default behavior can change between servers, databases and even between executions in the same session, so play it safe and ALWAYS explicitly declare NULL or NOT NULL.

    Related Articles

    Can I fix this mm/dd/yyyy <-> dd/mm/yyyy confusion once and for all?
    Could I get some help with JOINs?
    How can I tell which version of MDAC I'm running?
    How do I access MIN, MAX, SUM, COUNT values from SQL statements?
    How do I change column order in a table structure?
    How do I change the order of columns in a table?
    How do I concatenate strings from a column into a single row?
    How do I convert columns of values into a single list?
    How do I determine if a database exists?
    How do I document / compare my SQL Server database(s)?
    How do I get the IDENTITY / AUTONUMBER value for the row I inserted?
    How do I solve 'ADO Could Not Find The Specified Provider'?
    Should I use BETWEEN in my database queries?
    Why can't I use the * wildcard in a database search?
    Why do I get 'Syntax Error in INSERT INTO Statement' with Access?
    Why do I get weird results when using both AND and OR in a query?
    Why do some SQL strings have an 'N' prefix?
    Why does AbsolutePosition return as -1?
    Why doesn't SQL Server allow me to separate DATE and TIME?
    Why is Query Analyzer only returning 255 characters?
    How do I deal with an apostrophe (') in a SQL statement?



    Created: 10/14/2000 | Last Updated: 3/12/2005 | broken links | helpful | not helpful | statistics
    © Copyright 2006, UBR, Inc. All Rights Reserved. (50)


    Copyright 1999-2006, All rights reserved.
    Finding content
    Finding content.  An error has occured...