如何通过比较两个字段并考虑性能来连接表答案

【问题标题】：how to join tables by comparing two fields, and also considering performance如何通过比较两个字段并考虑性能来连接表
【发布时间】：2019-03-31 14:15:49
【问题描述】：

这应该很简单，但我无法理解。我需要进行选择以获取某些帐户的更新日期值。

我从这里开始，T1：

+----------+---------+
|  date   | account |
+----------+---------+
| 4/1/2018 |       1 |
| 4/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

然后在T2更新一些日期：

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
+----------+---------+

我怎样才能将这个输出输入到 T3 中，只更新那些帐户？

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

我可以加入帐号，但是那些没有改变的呢？如何捕捉那些？

另外，T1 有大约 800 万条记录，因此性能是一个因素。从 Teradata 中提取，加载到 Hive。

谢谢！

【问题讨论】：

标签： sql join hive teradata

【解决方案1】：

只是对以前的好答案的补充.. 也可以尝试将它与 coalesce 一起使用，如果它可以提高性能，请告诉我。

select t1.Account, coalesce(t2.Date, t1.Date) 
from t1
left outer join t2
  on t2.Account = t1.Account

【讨论】：

它不会提高 CASE 的性能，但它的语法更短更简洁
我同意，如果您有多个列的复杂条件，Case 仍然很有用。

【解决方案2】：

我想你想要：

select t2.*
from t2
union all
select t1.*
from t1
where not exists (select 1 from t2 where t2.account = t1.account);

首先从t2 中选择。然后它会从t1 中取出剩余的帐户。

【讨论】：

【解决方案3】：

这是另一种使用左外连接的解决方案：

select t1.Account, case when t2.Date is null then t1.Date else t2.Date end
from t1
left outer join t2 on t2.Account = t1.Account

【讨论】：