Dealing with duplicate rows in SQL Server
Finding and cleaning up duplicate rows in SQL Server tables
In case your data in tables needs to have unique constraint based on more than one value in the row, the proper solution for that would be a composite key.
Composite Key In SQL. Composite key, or composite primary key, refers to cases where more than one column is used to specify the primary key of a table. In such cases, all foreign keys will also need to include all the columns in the composite key.
Composite primary keys
Creating composite primary keys in SQL Server syntax is pretty easy
CREATE TABLE my_parts ( id_part1 INT NOT NULL , id_part2 INT NOT NULL , id_part3 INT NOT NULL , PRIMARY KEY ( id_part1, id_part2, id_part3 ) ); GO
In case of already existing table altering the composite key constraint is also very easy with a simple query
ALTER TABLE my_parts ADD PRIMARY KEY (id_part1, id_part2,id_part3); GO
But in case of legacy, existing systems, when you are not allowed to make big changes in an ongoing system, you have to deal with duplicates by simple finding them and then removing them from the table data.
This can be a real headache, especially if existing data is critical for live system running.
First thing is to find the duplicates
SELECT id_part1 , id_part2 , id_part3 , COUNT(*) AS [count] FROM dbo.my_parts (NOLOCK)GROUP BY id_part1 , id_part2 , id_part3 HAVING COUNT(*) > 1;
Composite unique constraint
In certain cases you might have one primary key for the table and one more composite key. In this case the query for finding duplicated will not help you much, since you will get number of duplicated an all the composite key values, but since primary key is anyway unique for every row, you cannot group by including the primary key in the query.
To have a tbale like this you need to follow a bit different approach for creating it in case you are starting from the scratch
CREATE TABLE my_parts ( id INT IDENTITY(1,1) NOT NULL, id_part1 INT NULL , id_part2 INT NULL , id_part3 INT NULL , PRIMARY KEY CLUSTERED (id), CONSTRAINT [CK_my_parts] UNIQUE (id_part1,id_part2,id_part3) ); GO
In case you are dealing with existing database which already has duplicates for the column values you want to include in composite unique constraint, before adding the constraint you need to clean up the table from duplicates. Only then you will be able to add composite unique contrsint to the existig table with existing data.
There is a good practice explained by Microsoft how to remove duplicates from the table https://support.microsoft.com/en-us/kb/139444 and it is ceratinly
This is a good practice to follow as it keeps the backup of records which are about to be deleted, so in case something goes wrong you can pull back the data, but in case you need something simpler you can use this query.
DELETE FROM dbo.my_parts WHERE id NOT IN ( SELECT MIN(id) FROM dbo.my_parts GROUP BY id_part1 , id_part2 , id_part3 );
After removing the duplicates you can alter table and add composite unique constraint.
References
Disclaimer
Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.
Comments for this article