在某些条件下将重复值设置为 NULL答案

【问题标题】：Set duplicate values to NULL on certain conditions在某些条件下将重复值设置为 NULL
【发布时间】：2019-01-29 15:01:57
【问题描述】：

我有一个相当大的 SQL 数据库（目前是 SQLite3），只有一个关系：

CREATE TABLE sometable (
    `name` TEXT,
    `position` INTEGER
);

由于数据的性质，没有主键或约束，只有 name 和 position 列上的非唯一索引。现在我需要将name 列设置为NULL，其中名称重复但位置不重复。重复的(name,position) 对是可以的，不应更改。

之前：

name | position
-----+---------
a    | 5
a    | 5
b    | 7
b    | 8
c    | 6
c    | 7
c    | 7
d    | 6

之后：

name | position
-----+---------
a    | 5
a    | 5
NULL | 6
NULL | 7
NULL | 6
NULL | 7
NULL | 8
d    | 6

我可以保留的唯一行是名称/职位关联中没有歧义的行。但是，必须保留重复的名称/职位对，只要该名称不与另一个职位相关联。

我找不到合适的 SQL 语句来执行此操作。

【问题讨论】：

标签： sql sqlite duplicates

【解决方案1】：

尝试使用相关子查询的更新来检查给定名称是否应替换为NULL。下面的子查询按名称聚合，然后检查是否有多个位置。如果是这样，那么该名称将成为更新的候选名称。

UPDATE sometable
SET name = NULL
WHERE EXISTS (SELECT name FROM sometable t2
              WHERE sometable.name = t2.name
              GROUP BY name
              HAVING COUNT(DISTINCT position) > 1);

【讨论】：

SQLite3 似乎在UPDATE 子句中存在表别名问题，因此您的示例会产生语法错误。没想到HAVING COUNT(DISTINCT position)，好主意。谢谢
谢谢，现在可以提交查询了。如果成功了，我会在几个小时后检查。执行需要一些时间。

【解决方案2】：

您可以将union all 与not exists/exists 一起使用：

select t.name, t.position 
from table t
where not exists (select 1 from table t1 where t1.name = t.name and t.position <> t1.position)
union all
select null, t.position 
from table t
where exists (select 1 from table t1 where t1.name = t.name and t.position <> t1.position);

所以，update 版本将是：

update table t
     set t.name = null
where exists (select 1 from table t1 where t1.name = t.name and t1.position <> t.position);

【讨论】：