【问题标题】:Query Review - Snowflake查询审查 - 雪花
【发布时间】:2022-01-04 12:38:44
【问题描述】:

我在雪花中有一个按预期工作的查询,但我觉得必须有更好的方法来做到这一点,所以我正在检查是否有人对此有更好、更有效的解决方案。

我想统计有多少用户拥有 SA4 和 SA5。然后检查它们是否是 multi_unit 。对于 multi_unit 的,计算他们拥有的其他 ST 产品的数量。

原表:

AB4_ind AB5_ind Multi_unit AB300_ind AB10_ind AB20_ind AB30_ind
1 0 1 1 1 0 1
1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 1 1 1 0 0 0
1 1 1 0 0 1 0
0 1 1 0 0 1 1

查询需要的输出表:

Product CNT Multi AB300 AB10 AB20 AB30
AB4 3 1 1 1 0 1
AB5 4 3 1 0 2 1

这是有效的查询,但我觉得必须有更好的方法来做到这一点。请让我知道你的想法:) 赞赏

SELECT
'AB4' AS Product,
COUNT(*) AS CNT,
SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
SUM(AB300_IND) AS AB300,
SUM(AB10_IND) AS AB10,
SUM(AB20_IND) AS AB20,
SUM(AB30_IND) AS AB30,
FROM TABLE.VIEW.MAW
WHERE AB4_IND = 1
GROUP BY 1
UNION
SELECT
'AB5' AS Product,
COUNT(*) AS CNT,
SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
SUM(AB300_IND) AS AB300,
SUM(AB10_IND) AS AB10,
SUM(AB20_IND) AS AB20,
SUM(AB30_IND) AS AB30,
FROM TABLE.VIEW.MAW
WHERE AB5_IND = 1
GROUP BY 1

【问题讨论】:

  • 请不要在您的问题中使用链接,因为没有任何安全意识的人会点击帖子中的随机链接。请使用可编辑文本的信息更新您的问题
  • 嘿@NickW!谢谢你的评论。很有意义,我刚刚编辑了它!感恩节快乐!

标签: sql query-optimization snowflake-cloud-data-platform snowflake-schema


【解决方案1】:

UNION 太过分了,因为聚合后行已经是唯一的。 UNION ALL 会运行得更快,UNION 会进行额外的 DISTINCT 聚合。

SELECT
'AB4' AS Product,
COUNT(*) AS CNT,
SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
SUM(AB300_IND) AS AB300,
SUM(AB10_IND) AS AB10,
SUM(AB20_IND) AS AB20,
SUM(AB30_IND) AS AB30
FROM TABLE.VIEW.MAW
WHERE AB4_IND = 1
GROUP BY 1
UNION ALL                ----use UNION ALL instead of UNION
SELECT
'AB5' AS Product,
COUNT(*) AS CNT,
SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
SUM(AB300_IND) AS AB300,
SUM(AB10_IND) AS AB10,
SUM(AB20_IND) AS AB20,
SUM(AB30_IND) AS AB30
FROM TABLE.VIEW.MAW
WHERE AB5_IND = 1
GROUP BY 1

进一步的优化是完全摆脱UNION ALL... 你不能使用没有union all的单个查询或像这样的条件加入

CASE WHEN AB4_IND = 1 THEN 'AB4'
            WHEN AB5_IND = 1 THEN 'AB5' END AS Product

并在 groupby 中使用它,因为如果同一行 AB4_IND 和 AB5_IND 都等于 1,它将仅计算 CASE (AB4) 中的第一个条件。

如果您要加入包含所需产品 ('AB4')、('AB5') 的常量两行集,您仍然可以摆脱第二个查询,这看起来更短并且性能可能更好:

SELECT p.Product,
       COUNT(*) AS CNT,
       SUM(CASE WHEN m.MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
       SUM(m.AB300_IND) AS AB300,
       SUM(m.AB10_IND) AS AB10,
       SUM(m.AB20_IND) AS AB20,
       SUM(m.AB30_IND) AS AB30
  FROM (VALUES ('AB4'), ('AB5')) AS p (Product)
       INNER JOIN TABLE.VIEW.MAW m 
        ON (p.Product='AB4' and m.AB4_IND = 1) OR (p.Product='AB5' and m.AB5_IND = 1)
  WHERE (m.AB4_IND = 1) OR (m.AB5_IND = 1)
  GROUP BY p.Product;

【讨论】:

    【解决方案2】:

    您也可以尝试 UNPIVOT 版本的解决方案:

    SELECT
        PRODUCT,
        COUNT(1) AS CNT,
        SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
        SUM(AB300_IND) AS AB300,
        SUM(AB10_IND) AS AB10,
        SUM(AB20_IND) AS AB20,
        SUM(AB30_IND) AS AB30
    FROM TABLE.VIEW.MAW
        UNPIVOT(PRODUCT_SELECTED FOR PRODUCT IN (AB4_IND, AB5_IND))
    WHERE PRODUCT_SELECTED = 1
    GROUP BY 1
    ;
    

    【讨论】:

      【解决方案3】:

      union 通常比 OR 执行得更好,因为优化器通常难以获得正确的估计,但可以帮助您的一件事是通过使用子查询将操作数限制为一次:

      SELECT
          Product,
          COUNT(*) AS CNT,
          SUM(CASE WHEN MULTI_UNIT = 1 THEN 1 ELSE 0 END) AS MULTI,
          SUM(AB300_IND) AS AB300,
          SUM(AB10_IND) AS AB10,
          SUM(AB20_IND) AS AB20,
          SUM(AB30_IND) AS AB30
      from (  select 'AB4' AS Product,MULTI_UNIT,AB300_IND,AB10_IND,AB20_IND,AB30_IND
              FROM TABLE.VIEW.MAW
              WHERE AB4_IND = 1
              UNION ALL 
              select 'AB5',MULTI_UNIT,AB300_IND,AB10_IND,AB20_IND,AB30_IND
              FROM TABLE.VIEW.MAW
              WHERE AB5_IND = 1
      ) t group by product
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-05-22
        • 2021-10-02
        • 2023-02-24
        • 2022-07-18
        • 2020-09-12
        • 1970-01-01
        相关资源
        最近更新 更多