How do I prevent duplicates in a table?
The question "how do I remove duplicates?" is asked so often in the SQL Server newsgroups, that I think there needs to be more recommendations in database articles and books regarding the importance of data integrity, and how to protect your data through primary keys and unique constraints.
If you already have duplicates in your table and you want to remove them, you can see a technique (useful if you have a primary key and the duplicate values are in another column), and a couple of links describing methods for other scenarios, in Article #2431.
If you want to read about some techniques for preventing redundant data in the future, read on.
Primary Key / Unique Constraint
In my opinion, the best possible way to prevent corrupted or redundant data is to enforce data integrity at the table level. You can do this using a PRIMARY KEY CONSTRAINT; for example:
Now, if you try to insert a duplicate record, for example:
You will be greeted with this error (the constraint and object name will vary, of course):
In Access, you would get this error:
You can also create a PRIMARY KEY that encompasses multiple columns. Let's pretend for a moment that it would be useful to allow email and/or username to be repeated in a table, but not both. We could enforce this as follows:
See Article #2504 for information on choosing a primary key (the examples in this article are completely contrived and superfluous).
Now, if your table already has a PRIMARY KEY, you might have other columns that need to be unique. Maybe you don't want them to be part of the PRIMARY KEY. Or maybe you want multiple columns to be validated individually for uniqueness, rather than as a whole. Or maybe you only want uniqueness to be enforced if the value is known (e.g. NOT NULL). You can do this with a UNIQUE CONSTRAINT, by adding the keyword UNIQUE to a column's definition in the CREATE TABLE statement.
You can also apply the constraint on its own, and on single or multiple columns. So again, imagine that these enforcements of uniqueness actually made sense:
When you violate this constraint, you will get the following error (it might be 'p' depending on which constraint you actually violate):
You can also apply such a constraint to the table that already exists, by simply adding a UNIQUE INDEX:
When you violate this constraint, you will get the following error:
Of course, all of these errors can be trapped in ASP:
So, if you can't add a PRIMARY KEY, UNIQUE CONSTRAINT, or UNIQUE INDEX to the table, then you can check manually to make sure the value you are *about* to insert does not already exist.
From within SQL Server
Of these three pre-checking methods, this is best because minimal lag time and a single transaction can help ensure that peripheral changes to the table don't affect your check. In a stored procedure, you can do this:
Then from ASP:
From ASP code (Access or SQL Server)
Another possibility is to run the check first from ASP, separate from the INSERT. Imagine separate stored procedures, but for brevity I'm going to put the ad hoc SQL statements directly in the code:
A problem with this method is that there is a lag, albeit small, between the query that counts the matching rows, and the actual insert. So, it is technically possible that another user will have inserted the same value in between.
From client-side code (Access or SQL Server)
Another thing you can do (and I wrote about this, way back in Active Server Pages 2.0 Unleashed), is to pull data to the client side, and use client-side validation to prevent duplicates. Note that this method is only feasible for finite datasets. Client-side script is not going to be able to iterate through thousands of values, never mind millions. So this method is usually useful for a small admin table with a very small number of rows.
It is not recommended that you rely solely on client-side code to prevent duplicates in a server-side database; rather, it should be an enhancement to server-side validation that makes the experience better for the end user — they are told immediately, rather than after a round-trip, that the value already exists.
If, for some reason, you can't add a primary key or unique constraint, can't change the stored procedure, and can't change the ASP code, you can still enforce uniqueness in a table using an INSTEAD OF TRIGGER (introduced in SQL Server 2000). There are two problems with this method: one minor, and one major. A minor problem is that an INSTEAD OF TRIGGER cancels the INSERT, so you have to re-code the INSERT again in the case where the value is not a duplicate. Second, if the INSERT includes a TEXT or NTEXT column, the trigger can't deal with those values.
You could also, assuming the INSERT is called within an explicit transaction, use an AFTER TRIGGER to rollback the transaction if it detects a duplicate was inserted.
We're going to stall on creating trigger examples, at least for the time being, because they should be used as a last resort (mainly for performance reasons). If this is not the case, and trigger examples would be useful for you, please let us know.
Related ArticlesHow do I build a query with optional parameters?
How do I calculate the median in a table?
How do I create a store locator feature?
How do I deal with MEMO, TEXT, HYPERLINK, and CURRENCY columns?
How do I deal with multiple resultsets from a stored procedure?
How do I debug my SQL statements?
How do I determine if a column exists in a given table?
How do I enable or disable connection pooling?
How do I enumerate through the DSNs on a machine?
How do I find a stored procedure containing <text>?
How do I get a list of Access tables and their row counts?
How do I get the latest version of the JET OLEDB drivers?
How do I handle alphabetic paging?
How do I handle BIT / BOOLEAN columns?
How do I handle error checking in a stored procedure?
How do I ignore common words in a search?
How do I page through a recordset?
How do I present one-to-many relationships in my ASP page?
How do I prevent my ASP pages from waiting for backend activity?
How do I prevent NULLs in my database from mucking up my HTML?
How do I protect my Access database (MDB file)?
How do I protect my stored procedure code?
How do I protect myself against the W32.Slammer worm?
How do I remove duplicates from a table?
How do I rename a column?
How do I retrieve a random record?
How do I return row numbers with my query?
How do I send a database query to a text file?
How do I simulate an array inside a stored procedure?
How do I solve 'Could not find installable ISAM' errors?
How do I solve 'Operation must use an updateable query' errors?
How do I temporarily disable a trigger?
How do I use a SELECT list alias in the WHERE or GROUP BY clause?
How do I use a variable in an ORDER BY clause?
Should I index my database table(s), and if so, how?
Should I store images in the database or the filesystem?
Should I use a #temp table or a @table variable?
Should I use a view, a stored procedure, or a user-defined function?
Should I use recordset iteration, or GetRows(), or GetString()?
What are all these dt_ stored procedures, and can I remove them?
What are the limitations of MS Access?
What are the limitations of MSDE?
What are the valid styles for converting datetime to string?
What datatype should I use for my character-based database columns?
What datatype should I use for numeric columns?
What does "ambiguous column name" mean?
What is this 'Multiple-step OLE DB' error?
What is wrong with 'SELECT *'?
What naming convention should I use in my database?
What should I choose for my primary key?
What should my connection string look like?
When should I use CreateObject to create my recordset objects?
Where can I get this 'Books Online' documentation?
Where do I get MSDE?
Which database platform should I use for my ASP application?
Which tool should I use: Enterprise Manager or Query Analyzer?
Why are there gaps in my IDENTITY / AUTOINCREMENT column?
Why can I not 'open a database created with a previous version...'?
Why can't I access a database or text file on another server?
Why can't I use the TOP keyword?
Why do I get 'Argument data type text is invalid for argument [...]'?
Why do I get 'Not enough space on temporary disk' errors?
Why does ASP give me ActiveX errors when connecting to a database?
Should I use COALESCE() or ISNULL()?
Where can I get basic info about using stored procedures?