【问题标题】:Overlapping effective dates aggregation重叠生效日期聚合
【发布时间】:2021-07-28 15:05:28
【问题描述】:

我正在尝试汇总重叠的生效日期。日期之间的任何间隔都应视为单独的行。我正在使用最小值和最大值,但我的输出低于预期,但希望看到预期的输出。

我的查询

WITH test_data AS (
    SELECT '2020-01-01' AS date_from,
           '2020-01-03' AS date_to,
           '1'          AS product
    UNION ALL
    SELECT '2020-01-05' AS date_from,
           '2020-01-07' AS date_to,
           '1'          AS product
    UNION ALL
    SELECT '2020-01-06' AS date_from,
           '2020-01-10' AS date_to,
           '1'          AS product
)
SELECT product,
       MIN(date_from) AS date_from,
       MAX(date_to)   AS date_to
FROM test_data
GROUP BY 1;

源数据

date_from date_to product
2020-01-01 2020-01-03 1
2020-01-05 2020-01-07 1
2020-01-06 2020-01-10 1

输出表

date_from date_to product
2020-01-01 2020-01-10 1

预期输出

date_from date_to product
2020-01-01 2020-01-03 1
2020-01-05 2020-01-10 1

提前致谢!

【问题讨论】:

  • 能否将您提出的查询添加到您的问题中?这将有助于其他人了解您尝试过的内容以及需要修复的内容。
  • 我想你在找this
  • @DominikGolebiewski 。 . .用您正在使用的数据库标记您的问题。

标签: sql snowflake-cloud-data-platform snowflake-schema


【解决方案1】:

这是一种孤岛问题。我推荐这样的方法:

SELECT product,
       MIN(date_from) AS date_from,
       MAX(date_to)   AS date_to
FROM (SELECT td.*,
             SUM(CASE WHEN prev_date_to >= date_from THEN 0 ELSE 1 END) OVER (PARTITION BY product ORDER BY date_to) as grp
      FROM (SELECT td.*,
                   MAX(date_to) OVER (PARTITION BY product ORDER BY date_from ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as prev_date_to
            FROM test_data td
           ) td
     ) td
GROUP BY grp, product
ORDER BY product, MIN(date_from);

Here 是一个 dbfiddle。

这是在做什么?最里面的子查询获取前行的最新date_to。这用于确定每一行是否“连接”到前一行,或者是否开始一个新的分组。

中间子查询的逻辑是行开始新组时的累积总和。然后,外部查询按此分组进行聚合。

【讨论】:

  • 我从来没有用过窗框!这是一种魅力!谢谢!
【解决方案2】:

日期范围的合并可以通过MATCH_RECOGNIZE实现。

数据准备:

CREATE OR REPLACE TABLE test_data AS
SELECT '2020-01-01'::DATE AS date_from, '2020-01-03'::DATE AS date_to, '1'  AS product
UNION ALL
SELECT '2020-01-05'::DATE AS date_from, '2020-01-07'::DATE AS date_to, '1'  AS product
UNION ALL
SELECT '2020-01-06'::DATE AS date_from, '2020-01-10'::DATE AS date_to, '1' AS product;

查询:

SELECT * 
FROM test_data t
MATCH_RECOGNIZE(
  PARTITION BY product
  ORDER BY date_from, date_to
  MEASURES FIRST(date_from) date_from, MAX(date_to) date_to
  PATTERN(a* b)
  DEFINE a AS MAX(date_to) OVER() >= NEXT(date_from)
) mr;

db<>fiddle demo - Oracle

相关阅读:Merging Overlapping Date Ranges with MATCH_RECOGNIZE by stewashton

【讨论】:

  • 看起来干净整洁!我喜欢这个解决方案。谢谢!
猜你喜欢
  • 2020-12-13
  • 2021-09-03
  • 1970-01-01
  • 2022-01-13
  • 2021-09-28
  • 1970-01-01
  • 2016-11-23
  • 2019-02-04
  • 1970-01-01
相关资源
最近更新 更多