比较表中的两列以获取剩余子集答案

【问题标题】：Comparing two columns across tables for a remaining subset比较表中的两列以获取剩余子集
【发布时间】：2022-01-24 18:15:12
【问题描述】：

如何从 table_1 中找到唯一的电话号码（并将它们折叠成一列）（同时保留 ID 和日期字段），并删除出现在 table_2 中的电话号码？

table_1

ID	phone1	phone2	date
1	1111111111		2021-12-31
5	2222222222	3333333333	2020-11-08
8	5555555555		2021-03-15
14	7777777777	8888888888	2016-10-20

table_2

ID	phone1	phone2	date
567	4444444444	1111111111	2020-11-28
660	8888888888		2018-01-01
898	9999999999		2017-04-06

无论电话出现在哪个电话列中，我都想将其从最终结果中删除。因此 ID 1 与电话 1111111111 将被删除，因为它在表 2 的电话 2 中

期望的输出

ID	phone num	date
5	2222222222	2020-11-08
5	3333333333	2020-11-08
8	5555555555	2021-03-15
14	7777777777	2016-10-20

到目前为止，我所拥有的似乎可行的就是这个。不过，我觉得必须有一种更有效的方法来做到这一点。

select * from (
    select id, phone1 as phone_num, date from table_1
    union all
    select id, phone2 as phone_num, date from table_1
) tmp

where phone_num not in (
    select phone1 as phone_num from table_2
    union all
    select phone2 as phone_num from table_2
)

order by id desc;

【问题讨论】：

添加一个位置以排除 phone2 中“空白”/null 的那些。除此之外；这很容易阅读/维护，我没有看到任何显着的性能改进。我想您可以使用 EXCEPT 而不是 where from table_2 ，因为它是一个集合运算符，并且可能效率更高。
EXCEPT 子句具有这种一般形式：select_statement EXCEPT [ ALL ] select_statementpostgresql.org/docs/7.4/sql-select.html 但是你所说的“高效”高效写作是什么意思？执行以获得绩效？高效且易于阅读...有很多方法可以实现...

标签： sql postgresql

【解决方案1】：

这有点矫枉过正，但它显示了两件事：

我们可以通过消除那些在 phone2 中没有值的人应该帮助表现。（假设值为 NULL 而不是空集）
我们使用多个 CTE 来确定在哪里做什么以及为什么并使用EXCEPT 集合运算符从来自“baseSET”的“ExclusionSET”。在我看来，这提高了维护/读取的能力，并且可能会有轻微的性能提升，因为基于“设置”的操作往往表现更好；但并非总是如此，必须进行测试才能“知道”，

WITH baseSET as (
    select id, phone1 as phone_num, date from table_1
    union all
    select id, phone2 as phone_num, date from table_1 Where Phone2 is not null
),

ExclusionSET as (
    select phone1 as phone_num from table_2
    union all
    select phone2 as phone_num from table_2
),

ResultSET as (
SELECT *
FROM baseSET
EXCEPT
SELECT *
FROM ExclusionSET)

SELECT * 
FROM ResultSET
ORDER BY id desc;

【讨论】：

我认为这就是我追求的效率。如果不能进行重大的性能改进，那么通过 CTE 提高可读性是我应该做的。感谢您的解决方案

【解决方案2】：

这有点“罗嗦”，但使用 CTE 将其分解为几个阶段，以定义要排除的单个数字列表，然后是有效的数字列表：

with exclude as (
    select phone1 as p from t2 union
    select phone2 from t2
), nos as (
    select case when phone1=p then null else phone1 end Phone1,
        case when phone2=p then null else phone2 end Phone2,
        id, date
    from t1
    left join exclude x on x.p=t1.phone1 or x.p=t1.phone2
)
select id, phone1 PhoneNum, date
from nos
where phone1 is not null
union 
select id, phone2, date
from nos
where phone2 is not null;

【讨论】：

【解决方案3】：

cte1 是 table_1 中电话号码的并集

和

cte2 是 table_2 中电话号码的并集

最终结果是从左连接 cte1 到 cte2 的数字表

with cte1 as (
    select id,phone1 phone_num,cr_date from table_1
        union
    select id,phone2 phone_num,cr_date from table_1 where phone2 is not NULL),
    cte2 as (select phone1 phone_num from table_2
         union 
    select phone2 from table_2 phone_num where phone2 is not NULL)
    select id,cte1.phone_num, cr_date from cte1 left join cte2 on (cte1.phone_num=cte2.phone_num)
           where cte2.phone_num is NULL order by id;

结果

id | phone_num  |  cr_date
----+------------+------------
  2 | 2222222222 | 2020-11-08
  2 | 3333333333 | 2020-11-08
  3 | 5555555555 | 2021-03-15
  4 | 7777777777 | 2016-10-20

【讨论】：