Monday, May 5, 2014

T-SQL: Remove duplicate rows

The following example partitions the query by columns foo and bar and generates a number for each sequence containing the same 'foo' and 'bar'. Then using common table expressions 'WITH cte', which stores temporary the result set in a table named 'cte' (or whatever you choose) and then uses the script 'DELETE cte where [RN] > 1' to remove and duplicate entries having more than one result set with the same 'foo' and 'bar' combination.

Be extra careful before using DELETE ;) I usually do a select query instead of a delete to see what I am about to delete and when i'm certain that the query is correct, I change it to a delete.


WITH cte AS (
  SELECT[foo], [bar], 
     row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
  FROM TABLE
)

DELETE cte WHERE [rn] > 1

Source
http://stackoverflow.com/questions/6025367/t-sql-deleting-all-duplicate-rows-but-keeping-one