Teradata 查询优化答案

【问题标题】：Teradata query optimisationTeradata 查询优化
【发布时间】：2020-06-22 01:39:39
【问题描述】：

我需要使用 Teradata SQL 助手从具有某些条件的数据库中选择一组人。以下哪种方法更快，为什么：

方法A

Create volatile table selection as ( 
Select * 
from table_a
Where id not in (sel id from table_b)
And id not in (sel id from table_c)
And id not in (sel id from table_d)
...
) With data primary index (id) on commit preserve rows;

方法B

Create volatile table selection as ( 
Select * 
from table_a
) With data primary index (id) on commit preserve rows;
Delete from selection where id in (sel id from table_b);
Delete from selection where id in (sel id from table_c);
Delete from selection where id in (sel id from table_d);

【问题讨论】：

标签： sql optimization teradata where-clause sql-delete

【解决方案1】：

您应该在您的数据和数据库上测试 whatever 查询。

我希望not exists 表现更好：

Select a.* 
from table_a a
where not exists (select 1 from table_b b where b.id = a.id) and
      not exists (select 1 from table_c c where c.id = a.id) and
      not exists (select 1 from table_d d where d.id = a.id) ;

特别是，这可以利用table_b(id)、table_c(id) 和table_d(id) 上的索引。此外，语义更清晰。当子查询返回NULL 时，带有子查询的NOT IN 可以返回（或不返回！）奇怪的结果。

也就是说，我希望获得正确的查询比创建表然后删除行更快。后者似乎涉及很多“制作工作”——将行添加到表中只是为了删除它们。

【讨论】：

【解决方案2】：

正如 Gordon 所写，如果这些 id 被定义为 NULLable，NOT EXISTS 将优于 NOT IN。否则它们是相等的，只需比较解释。

这三个子查询将被转换为三个连接，另一种仅使用单个连接的解决方案：

Create volatile table selection as ( 
Select * 
from table_a
Where id not in 
 ( sel id from table_b 
   union all
   sel id from table_c
   union all
   sel id from table_d
 )
...
) With data primary index (id) on commit preserve rows;

当然，性能还取决于每个表的行数和现有索引。

【讨论】：