删除重复记录答案

【问题标题】：Delete of duplicate records删除重复记录
【发布时间】：2021-03-17 09:08:00
【问题描述】：

我有一个表，我想根据两列（id 和角色）识别重复记录，并使用第三列（单元）选择要分析的记录子集并在其中进行删除。这是表格和几行示例数据：

id | role | unit
----------------
946| 1001 |   1
946| 1002 |   1
946| 1003 |   1
946| 1001 |   2 
946| 1002 |   2
900| 1001 |   3
900| 1002 |   3
900| 1001 |   3

对单元 1 和 2 的分析应确定两行以删除 946/1001 和 946/1002。删除标记为 unit 1 或 2 的行并不重要。在后续步骤中，我会将标记为 unit=2 的所有记录更新为 unit=1。

我有一个能够识别要删除的行的选择语句：

SELECT * FROM (SELECT 
        unit, 
        id, 
        role,  
        ROW_NUMBER() OVER (
            PARTITION BY 
                id, 
                role 
            ORDER BY 
                id, 
                role
        ) row_num
     FROM thetable WHERE unit IN (1,2)  ) as x
WHERE row_num > 1;

此查询将给出以下结果：

id | role | unit
----------------
946| 1001 |   2 
946| 1002 |   2

现在我想将它与 DELETE 结合起来删除已识别的记录。我已经非常接近（我相信）这个声明：

DELETE FROM thetable tp1 WHERE EXISTS 

(SELECT 
        unit, 
        id, 
        role,  
        ROW_NUMBER() OVER (
            PARTITION BY 
                id, 
                role 
            ORDER BY 
                id, 
                role
        ) as row_num
     FROM 
        thetable tp2 
        WHERE unit IN (1,2) AND 
        tp1.unit=tp2.unit AND 
        tp1.role=tp2.role AND 
        tp1.id=tp2.id AND row_num >1
)

但是，row_num 未被识别为列。那么我应该如何修改这条语句来删除这两条被识别的记录呢？

【问题讨论】：

您不能在 where 子句中使用 ROW_NUMBER() 也不能在同一级别的 SELECT 上定义别名 Why no windowed functions in where clauses?。

标签： sql postgresql duplicates subquery sql-delete

【解决方案1】：

用EXISTS很简单：

DELETE FROM thetable t
WHERE t.unit IN (1,2)
AND EXISTS (
  SELECT 1 FROM thetable
  WHERE (id, role) = (t.id, t.role) AND unit < t.unit
)

请参阅demo。
结果：

>  id | role | unit
> --: | ---: | ---:
> 946 | 1001 |    1
> 946 | 1002 |    1
> 946 | 1003 |    1
> 900 | 1001 |    3
> 900 | 1002 |    3
> 900 | 1001 |    3

【讨论】：

非常好的提议。它有效！感谢您设置演示案例。这真的很有帮助。

【解决方案2】：

你可以这样表述：

delete from thetable t 
where t.unit > (
    select min(t1.unit)
    from thetable t1
    where t1.id = t.id and t1.role = t.role
)

这似乎是解决分配问题的一种简单方法，基本上表述为：删除存在另一行且具有较小 unit 和相同的 id 和 role 的行。

至于你想写的查询，使用row_number()，我想应该是：

delete from thetable t
using (
    select t.*, row_number() over(partition by id, role order by unit) rn
    from mytable t
) t1
where t1.id = t.id and t1.role = t.role and t1.unit = t.unit and t1.rn > 1

【讨论】：