【问题标题】:SQL Window function to get 2 closest events获取 2 个最近事件的 SQL 窗口函数
【发布时间】:2020-02-06 18:57:25
【问题描述】:

我试图解决此处所述的“分析天气模式”问题 (https://joins-238123.netlify.com/window-functions/)

你担心飓风会更频繁地发生,所以你 决定做一点点分析。对于每种天气事件 找出最近发生的 2 个事件以及它们何时发生 发生了

带有如下数据的表格天气:

type    day
rain    6
rain    12
thunderstorm    13
rain    21
rain    27
rain    37
rain    44
rain    54
thunderstorm    56
rain    58
rain    61
rain    65
rain    68
rain    73
rain    82
hurricane   87
rain    92
rain    95
rain    98
rain    108
thunderstorm    111
rain    118
rain    123
rain    128
rain    131
hurricane   135
rain    136
rain    140
rain    149
thunderstorm    158
rain    159
rain    167
rain    175
hurricane   178
rain    179
rain    186
rain    192
rain    200
thunderstorm    202
rain    210
rain    219
thunderstorm    222
rain    226
rain    232
thunderstorm    238
rain    241
rain    246
rain    253
thunderstorm    257
rain    257
rain    267
rain    277
rain    286
rain    295
rain    302
rain    307
thunderstorm    312
rain    316
rain    325
thunderstorm    330

我可以想出:

select type, day, COALESCE(day - LAG(day, 1) over (partition by type order by day), 0) as days_since_previous from weather

它给了我这样的结果:

type        day days_since_previous
hurricane   87  0
hurricane   135 48
hurricane   178 43
rain        6   0
rain        12  6
rain        21  9
rain        27  6

但我无法将结果缩小到 2 个最接近的事件,并且只显示它们之间的天数。

我该如何做才能得到想要的结果,例如:

type           day  days_since_previous
rain            61  3
hurricane       178 43
thunderstorm    238 16

【问题讨论】:

    标签: sql sqlite window-functions


    【解决方案1】:

    您可以使用另一个窗口函数来调整行:

    SELECT type, day, days_since_previous
    FROM (
      SELECT type, day, (day - prev_day) AS days_since_previous,
        ROW_NUMBER() OVER(PARTITION BY type ORDER BY (day - prev_day)) AS RowNum
      FROM (
        select type, day,
          LAG(day, 1) over (partition by type order by day) as prev_day
        from weather
      ) src
      WHERE prev_day IS NOT NULL -- Ignore "first" events
    ) src
    WHERE RowNum = 1
    order by day
    

    我还删除了COALESCE,因为这导致计算中包含“第一个”事件。

    【讨论】:

      【解决方案2】:

      如果您不坚持显示 day 值 - 您可以运行嵌套查询:

      1. 按照您的建议,在一个 SELECT 中(在 WITH 子句或嵌套子选择中)将前一天的间隙作为 OLAP 函数添加。不需要合并,真的..
      2. 从该全查询中,运行 GROUP BY 选择。

      像这样:

      WITH
      w_gap2prev AS (
      SELECT
        *
      , day - LAG(day) OVER(PARTITION BY type ORDER BY day) AS gap
      FROM input
      )
      SELECT
        type
      , MIN(gap) AS days_since_previous
      FROM w_gap2prev
      WHERE gap IS NOT NULL
      GROUP BY type
      ; 
      -- out      type     | days_since_previous 
      -- out --------------+---------------------
      -- out  hurricane    |                  43
      -- out  rain         |                   3
      -- out  thunderstorm |                  16
      -- out (3 rows)
      -- out 
      -- out Time: First fetch (3 rows): 56.441 ms. All rows formatted: 56.479 ms
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-08-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-11-05
        • 1970-01-01
        • 2011-04-01
        相关资源
        最近更新 更多