【问题标题】:Reduce left outer join in SQL query减少 SQL 查询中的左外连接
【发布时间】:2020-03-21 23:52:19
【问题描述】:

用 sqlfilld 更新SQLfiddle

我有一个 Oracle 查询,我需要减少 左外连接 的数量以高效执行。当前查询运行了 2 多个小时,我想通过减少连接操作的数量来降低其复杂性。

没有连接,查询将在 15 分钟内运行。因此我想重写逻辑。有什么有效的方法吗?

WITH myquery AS
(
    SELECT * 
    FROM TEST_FILE1
)
SELECT 
    A.Col3, A.Col1, A.Col2, A.Col4, A.Col5
  --  D.CB,
  --  NVL(D.CD, 0), NVL(D.CE, 0), NVL(D.EF, 0),
    ,CASE WHEN V1.Col1 IS NULL THEN 0 ELSE 1 END AS QQ1
    ,CASE WHEN V2.Col3 IS NULL THEN 0 ELSE 1 END AS QQ2
    ,CASE WHEN V3.Col1 IS NULL THEN 0 ELSE 1 END AS QQ3
    ,CASE WHEN V4.Col3 IS NULL THEN 0 ELSE 1 END AS QQ4
, case when V5.Col1 is NULL then 0 else 1 end as QQ5
, case when V6.Col3 is NULL then 0 else 1 end as QQ6
, case when V7.Col1 is NULL then 0 else 1 end as QQ7
, case when V8.Col3 is NULL then 0 else 1 end as QQ8
FROM (
  SELECT Col3, Col1, Col2, Col4, Col5 
  FROM (
    SELECT distinct Col3
    FROM myquery
  ) A1
  CROSS JOIN (
    SELECT distinct Col1
    FROM myquery
  ) A2
  CROSS JOIN (
    SELECT distinct Col2
    FROM myquery
  ) A3
  CROSS JOIN (
    SELECT distinct Col4
    FROM myquery
  ) A4
  CROSS JOIN (
    SELECT distinct Col5
    FROM myquery
  ) A5
  WHERE Col3 = 42
) A
LEFT JOIN myquery D on NVL(D.Col3, '-') = NVL(A.Col3, '-') AND NVL(D.Col1, '-') = NVL(A.Col1, '-') 
    AND NVL(D.Col2, '-') = NVL(A.Col2, '-') AND NVL(D.Col4, '-') = NVL(A.Col4, '-') AND NVL(D.Col5, 
'-') = NVL(A.Col5, '-')
LEFT JOIN (
  SELECT distinct Col1, Col3, Col5
  FROM myquery
) V1 on V1.Col1 = A.Col1 AND V1.Col3 = A.Col3 AND V1.Col5 = A.Col5
LEFT JOIN (
  SELECT distinct Col3, Col5, Col2
  FROM myquery
) V2 on V2.Col3 = A.Col3 AND V2.Col5 = A.Col5 AND V2.Col2 = A.Col2
LEFT JOIN (
  SELECT distinct Col3, Col5, Col1, Col2
  FROM myquery
) V3 on V3.Col3 = A.Col3 AND V3.Col5 = A.Col5 AND V3.Col1 = A.Col1 AND V3.Col2 = A.Col2
LEFT JOIN (
  SELECT distinct Col3, Col5, Col2
  FROM myquery
  WHERE Col1 in ('Bert','Myra')
) V4 on V4.Col3 = A.Col3 AND V4.Col5 = A.Col5 AND V4.Col2 = A.Col2
LEFT JOIN (
  SELECT distinct Col1, Col3
  FROM myquery
) V5 on V5.Col1 = A.Col1 AND V5.Col3 = A.Col3
LEFT JOIN (
  SELECT distinct Col3, Col2
  FROM myquery
) V6 on V6.Col3 = A.Col3 AND V6.Col2 = A.Col2
LEFT JOIN (
  SELECT distinct Col3, Col1, Col2
  FROM myquery
) V7 on V7.Col3 = A.Col3 AND V7.Col1 = A.Col1 AND V7.Col2 = A.Col2
LEFT JOIN (
  SELECT distinct Col3, Col2
  FROM myquery
  WHERE Col1 in ('Bert','Myra')
) V8 on V8.Col3 = A.Col3 AND V8.Col2 = A.Col2

到目前为止,我一直在考虑使用分析窗口函数,但没有得到所需的输出。任何线索都将受到高度赞赏。

这是我的 test_file 表的输入数据

+------+------+------+------+------+ | COL1 | COL2 | COL3 | COL4 | COL5 | +------+------+------+------+------+ |伯特 | "M" | 42 | 68 | 166 | |卡尔 | "M" | 32 | 70 | 155 | |戴夫 | "M" | 39 | 72 | 167 | |艾莉 | "F" | 30 | 66 | 124 | |法兰 | "F" | 33 | 66 | 115 | |汉克 | "M" | 30 | 71 | 158 | |杰克 | "M" | 32 | 69 | 143 | |卢克 | "M" | 34 | 72 | 163 | |尼尔 | "M" | 36 | 75 | 160 | |页 | "F" | 31 | 67 | 135 | |亚历克斯 | "M" | 41 | 74 | 170 | |格温 | "F" | 26 | 64 | 121 | |伊万 | "M" | 53 | 72 | 175 | |凯特 | "F" | 47 | 69 | 139 | |迈拉 | "F" | 23 | 62 | 98 | |奥马尔 | "M" | 38 | 70 | 145 | |奎因 | "M" | 29 | 71 | 176 | |露丝 | "F" | 28 | 65 | 131 | +------+------+------+------+------+

从这个表中,我想通过应用 cross join 来获取每列的不同值来创建每个可能的组合。它将在 col1=42 上使用我的过滤器生成 7776 条记录。因为我只想要此列的所有可能组合。

通过这种组合,我想使用左外连接的多种组合检查所有列组合是否为空。

输出(部分):

+-----+------+------+------+------+-----+-----+-- ---+-----+------+-----+-----+------+ | COL3 | COL1 | COL2 | COL4 | COL5 | QQ1 | QQ2 | QQ3 | QQ4 | QQ5 | QQ6 | QQ7 | QQ8 | +-----+------+------+------+------+-----+-----+-- ---+-----+------+-----+-----+------+ | 42 |页 | "F" | 68 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |亚历克斯 | "F" | 62 | 143 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |法兰 | "M" | 66 | 175 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |奥马尔 | "F" | 70 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |艾莉 | "M" | 72 | 124 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |奎因 | "M" | 64 | 160 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |奥马尔 | "M" | 64 | 158 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |凯特 | "F" | 62 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |尼尔 | "F" | 69 | 145 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |戴夫 | "F" | 62 | 163 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 42 |露丝 | "M" | 70 | 115 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |伯特 | "M" | 65 | 121 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | | 42 |伯特 | "M" | 72 | 145 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | | 42 |奥马尔 | "M" | 62 | 158 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | | 42 |露丝 | "M" | 75 | 131 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | +-----+------+------+------+------+-----+-----+-- ---+-----+------+-----+-----+------+

【问题讨论】:

  • 样本数据和输出会有帮助
  • 样本数据和所需输出任务描述会很有帮助。即使假设您当前的查询是正确的(我们不知道),我们能做的最好的事情就是从代码中猜测您的问题需求;一种完全浪费时间且容易出错的逆向工程形式。除了显示您尝试过的查询之外,告诉我们查询应该做什么。
  • @mathguy 我已经更新了输入样本数据。
  • @Avi 我已经更新了数据

标签: sql oracle join left-join


【解决方案1】:

在检查表中是否存在数据时,我们使用EXISTSIN,而不是JOIN (SELECT DISTINCT ...)。因此,这就是我可能想出的查询:

WITH myquery AS
(
  SELECT * FROM TEST_FILE1
)
, a as
(
  select col1, col2, 42 as col3, col4, col5
  from
  (
    (select distinct col1 from myquery)
      cross join
    (select distinct col2 from myquery)
      cross join
    (select distinct col4 from myquery)
      cross join
    (select distinct col5 from myquery)
  )
)
select
  a.col1, a.col2, a.col3, a.col4, a.col5,
  case when (col1, col3, col5)       in (select col1, col3, col5       from myquery                               ) then 1 else 0 end as v1,
  case when (col2, col3, col5)       in (select col2, col3, col5       from myquery                               ) then 1 else 0 end as v2,
  case when (col1, col2, col3, col5) in (select col1, col2, col3, col5 from myquery                               ) then 1 else 0 end as v3,
  case when (col2, col3, col5)       in (select col2, col3, col5       from myquery where col1 in ('Bert', 'Myra')) then 1 else 0 end as v4,
  case when (col1, col3)             in (select col1, col3             from myquery                               ) then 1 else 0 end as v5,
  case when (col2, col3)             in (select col2, col3             from myquery                               ) then 1 else 0 end as v6,
  case when (col1, col2, col3)       in (select col1, col2, col3       from myquery                               ) then 1 else 0 end as v7,
  case when (col2, col3)             in (select col2, col3             from myquery where col1 in ('Bert', 'Myra')) then 1 else 0 end as v8
from a
order by a.col1, a.col2, a.col3, a.col4, a.col5;

如果您在此处的真正查询:WITH myquery AS (...) 不仅仅是一个 SELECT * FROM TEST_FILE1,您可能需要在此处使用 /*+MATERIALIZE*/ 提示以加快访问速度。

【讨论】:

  • 非常感谢您的详细询问。在我的查询中,我需要来自 myquery 和通过左外连接的几列,如下所示 LEFT JOIN myquery D on NVL(D.Col3, '-') = NVL(A.Col3, '-') AND NVL(D.Col1, '-') = NVL(A.Col1, '-') AND NVL(D.Col2, '-') = NVL(A.Col2, '-') AND NVL(D.Col4, '-') = NVL(A.Col4, '-') AND NVL(D.Col5, '-') = NVL(A.Col5, '-') 。如果我尝试在修改后的查询中包含上述 leftouter 联接,则查询将永远运行。有趣的是,如果我删除左用户加入,我会在几秒钟内获得记录。
  • 这是因为您在列上调用函数。 DBMS 不了解您在做什么,并扫描整个表的每一行。解决此问题的一种方法可能是使用ON (d.col1 = a.col1 OR (d.col1 IS NULL AND a.col1 IS NULL)) AND (d.col2 = a.col2 OR (d.col2 IS NULL AND a.col2 IS NULL) ...) 明确您的意图。 (在标准 SQL 中,我们会写成 d.col1 IS NOT DISTINCT FROM a.col1 AND d.col2 IS NOT DISTINCT FROM a.col2,但 Oracle 还不支持。)
  • 另一种常见的方法是在有问题的表和表达式上提供函数索引:create index idx1 on d (NVL(d.col1, '-'), NVL(d.col2, '-'), NVL(d.col3, '-'), NVL(d.col3, '-')),但由于A 只是一个临时视图而不是表,因此您无法创建索引在上面。好吧,也许D 表上的索引就足够了,因为它仍然可以帮助 DBMS 更快地找到它的行。不过,我更喜欢带有 `ON (d.col1 = a.col1 OR (d.col1 IS NULL AND a.col1 IS NULL)) AND ...` 的普通版本。所以,如果这解决了性能问题。我会同意的。
猜你喜欢
  • 1970-01-01
  • 2016-02-19
  • 1970-01-01
  • 1970-01-01
  • 2015-11-30
  • 1970-01-01
  • 2016-10-18
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多