Dealing with duplicate rows in SQL Server

Finding and cleaning up duplicate rows in SQL Server tables

  • Share

In case your data in tables needs to have unique constraint based on more than one value in the row, the proper solution for that would be a composite key.

Note

Composite Key In SQL. Composite key, or composite primary key, refers to cases where more than one column is used to specify the primary key of a table. In such cases, all foreign keys will also need to include all the columns in the composite key.

Composite primary keys

Creating composite primary keys in SQL Server syntax is pretty easy

CREATE TABLE my_parts
    (
      id_part1 INT NOT NULL ,
      id_part2 INT NOT NULL ,
      id_part3 INT NOT NULL ,
      PRIMARY KEY ( id_part1, id_part2, id_part3 )
    );
GO
    

In case of already existing table altering the composite key constraint is also very easy with a simple query

ALTER TABLE my_parts
ADD  PRIMARY KEY (id_part1, id_part2,id_part3);
GO
    

But in case of legacy, existing systems, when you are not allowed to make big changes in an ongoing system, you have to deal with duplicates by simple finding them and then removing them from the table data.

This can be a real headache, especially if existing data is critical for live system running.

First thing is to find the duplicates

SELECT  id_part1 ,
        id_part2 ,
        id_part3 ,
        COUNT(*) AS [count]
FROM    dbo.my_parts (NOLOCK)GROUP BY id_part1 ,
        id_part2 ,
        id_part3
HAVING  COUNT(*) > 1;
    

Composite unique constraint

In certain cases you might have one primary key for the table and one more composite key. In this case the query for finding duplicated will not help you much, since you will get number of duplicated an all the composite key values, but since primary key is anyway unique for every row, you cannot group by including the primary key in the query.

To have a tbale like this you need to follow a bit different approach for creating it in case you are starting from the scratch

CREATE TABLE my_parts
    (
	  id INT IDENTITY(1,1) NOT NULL,
      id_part1 INT NULL ,
      id_part2 INT NULL ,
      id_part3 INT NULL ,
      PRIMARY KEY CLUSTERED (id),
	  CONSTRAINT [CK_my_parts] UNIQUE (id_part1,id_part2,id_part3)
    );
GO
    

In case you are dealing with existing database which already has duplicates for the column values you want to include in composite unique constraint, before adding the constraint you need to clean up the table from duplicates. Only then you will be able to add composite unique contrsint to the existig table with existing data.

There is a good practice explained by Microsoft how to remove duplicates from the table https://support.microsoft.com/en-us/kb/139444 and it is ceratinly

This is a good practice to follow as it keeps the backup of records which are about to be deleted, so in case something goes wrong you can pull back the data, but in case you need something simpler you can use this query.

DELETE  FROM dbo.my_parts
WHERE   id NOT IN ( SELECT  MIN(id)
                    FROM    dbo.my_parts
                    GROUP BY id_part1 ,
                            id_part2 ,
                            id_part3 );
    

After removing the duplicates you can alter table and add composite unique constraint.

References

  • Share

Disclaimer

Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.

Comments for this article

comments powered by Disqus