【问题标题】:Snowflake not sure about query time comparaison雪花不确定查询时间比较
【发布时间】:2021-03-02 19:39:41
【问题描述】:

我想计算与所选对象上个月的流失率。 在这种情况下,我想知道从 2020-12-01 到 2021-01-01 丢失的客户。

SELECT  DISTINCT("Client Ref"),"Zone","Market Place","Report Period"
FROM    "REPORT_DB"."PBI"."Revenue" d
WHERE   "Report Period" ='2020-12-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL 
AND
NOT EXISTS
        (
        SELECT  "Client Ref"
        FROM    "REPORT_DB"."PBI"."Revenue" t
        WHERE   "Report Period" ='2021-01-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL AND d."Client Ref"=t."Client Ref"
        )

获取它的方法正确吗?

问候。

【问题讨论】:

    标签: time snowflake-cloud-data-platform


    【解决方案1】:

    因此,通过添加带有一些虚拟数据的 CTE,并将列名更改为安全

    WITH data AS (
        SELECT * FROM VALUES 
        (1,'a','UK','2020-12-01'),
        (1,'a','UK','2021-01-01'),
        (2,'a','UK','2020-12-01'),
        (3,'a','UK','2021-01-01')
        v( Client_Ref, zone, Market_Place, Report_Period)
    )
    SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
    FROM data AS d
    WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
    AND
    NOT EXISTS
            (
            SELECT  t.Client_Ref
            FROM    data t
            WHERE   t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND t.Client_Ref IS NOT NULL AND d.Client_Ref=t.Client_Ref
            );
    

    您的 SQL 基本表单有效并返回:

    CLIENT_REF  ZONE    MARKET_PLACE    REPORT_PERIOD
    2           a       UK              2020-12-01
    

    这是预期的结果。

    此查询是关联子查询,Snowflake 对其支持有限。因此,虽然这可行,但当您更改查询时,它可能会遇到Unsupported subquery type cannot be evaluated 错误,请参阅SO correlated sub-query question

    通过使用LEFT JOINWHERE x IS NULL 模式,可以以不相关的形式编写基本查询:

    WITH data AS (
        SELECT * FROM VALUES 
        (1,'a','UK','2020-12-01'),
        (1,'a','UK','2021-01-01'),
        (2,'a','UK','2020-12-01'),
        (3,'a','UK','2021-01-01')
        v( Client_Ref, zone, Market_Place, Report_Period)
    )
    SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
    FROM data AS d
    LEFT JOIN data AS t
        ON t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND d.Client_Ref=t.Client_Ref
    WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
    AND t.Client_Ref IS NULL;
    

    如果您的数据源有很多行不在目标结果范围内,可以重写它以首先进行一些过滤,如下所示:

    WITH data AS (
        SELECT * FROM VALUES 
        (1,'a','UK','2020-12-01'),
        (1,'a','UK','2021-01-01'),
        (2,'a','UK','2020-12-01'),
        (3,'a','UK','2021-01-01')
        v( Client_Ref, zone, Market_Place, Report_Period)
    ), wanted_data AS (
        SELECT DISTINCT Client_Ref, zone, Market_Place, Report_Period
        FROM data
        WHERE Report_Period BETWEEN '2020-12-01' AND '2021-01-01'
        AND Market_Place ='UK' AND Client_Ref IS NOT NULL
    )
    SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
    FROM wanted_data AS d
    LEFT JOIN wanted_data AS t
        ON t.Report_Period ='2021-01-01'AND d.Client_Ref=t.Client_Ref
    WHERE d.Report_Period ='2020-12-01' 
    AND t.Client_Ref IS NULL;
    

    但是对于我来说,如果我像您一样将列命名为 "Client Ref",我的 SQL 将无法正常工作,因此我无法回答这部分,但这就是您构建 SQL 的方式。

    【讨论】:

      猜你喜欢
      • 2021-09-21
      • 1970-01-01
      • 2022-01-04
      • 1970-01-01
      • 2023-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多