【问题标题】:Min and max of grouped time sequences in SQLSQL中分组时间序列的最小值和最大值
【发布时间】:2018-09-20 06:29:52
【问题描述】:

我有一个大型 Postgres 表 test,我想从中提取 连续 序列的 no_signal 每个 mobile_id 状态,或者换句话说,单个移动设备运行的时间长度中止服务。

在真实表中,记录没有排序,我认为这意味着除了窗口函数之外,还必须包含 PARTITION OVER (time, mobile_id) 语句。任何关于如何为单个连续序列创建一个组,然后为每个组取最小值和最大值的建议将不胜感激。

-- CREATE TABLE test (mobile_id int, state varchar, time timestamp, region varchar)

INSERT INTO test (mobile_id, state, time, region ) VALUES
(1, 'active', TIMESTAMP '2018-08-09 15:00:00', 'EU'),  
(1, 'active', TIMESTAMP '2018-08-09 16:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 17:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 18:00:00', 'EU'),
(1, 'no_signal', TIMESTAMP '2018-08-09 19:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 20:00:00', 'EU'),
(1, 'inactive', TIMESTAMP '2018-08-09 21:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 22:00:00', 'EU'),
(1, 'active', TIMESTAMP '2018-08-09 23:00:00', 'EU'),
(2, 'active', TIMESTAMP '2018-08-10 00:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 01:00:00', 'EU'),
(2, 'active', TIMESTAMP '2018-08-10 02:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 03:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 04:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 05:00:00', 'EU'),
(2, 'no_signal', TIMESTAMP '2018-08-10 06:00:00', 'EU'),
(3, 'active', TIMESTAMP '2018-08-10 07:00:00', 'SA'),
(3, 'active', TIMESTAMP '2018-08-10 08:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 09:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 10:00:00', 'SA'),
(3, 'inactive', TIMESTAMP '2018-08-10 11:00:00', 'SA'),
(3, 'inactive', TIMESTAMP '2018-08-10 12:00:00', 'SA'),
(3, 'no_signal', TIMESTAMP '2018-08-10 13:00:00', 'SA')

我的目标是这样的输出:

 mobile_id          start_time            end_time diff_time region
         1 2018-08-09 17:00:00 2018-08-09 19:00:00       120     EU
         2 2018-08-10 01:00:00 2018-08-10 01:00:00         0     EU
         2 2018-08-10 03:00:00 2018-08-10 06:00:00       180     EU
         3 2018-08-10 09:00:00 2018-08-10 10:00:00        60     SA
         3 2018-08-10 13:00:00 2018-08-10 13:00:00         0     SA

由于未正确创建组,因此以下代码不会产生所需的结果:

select mobile_id, region,
       least(extract(epoch from max(time) - min(time)), 0) as diff
from (select t.*,
             count(*) filter (where state = 'no_signal) over (partition by mobile_id, region order by time) as grp
      from t
     ) t
group by mobile_id, region, grp;

【问题讨论】:

    标签: sql postgresql window-functions


    【解决方案1】:

    这是间隙和孤岛问题的变体。在这种情况下,您尝试检测每个手机号码具有 no_signal 的多个记录岛。

    此答案使用“行号差异法”。诀窍在于以两种方式在您的桌子上应用ROW_NUMBER。第一个生成所有记录的序列,按时间排序,而第二个生成每个mobile_id 组的序列,然后只为那些状态为no_signal 的记录生成序列。这些行号值中的差异可用于形成每个岛。然后,我们只需要聚合并取最小/最大时间戳值就可以得到你想要的结果。

    WITH cte1 AS (
        SELECT *, ROW_NUMBER() OVER (ORDER BY time) rn1
        FROM test
    ),
    cte2 AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY mobile_id ORDER BY time) rn2
        FROM test
        WHERE state = 'no_signal'
    ),
    cte3 AS (
        SELECT t1.*, t2.rn2
        FROM cte1 t1
        LEFT JOIN cte2 t2
            ON t1.mobile_id = t2.mobile_id AND t1.time = t2.time
        WHERE t1.state = 'no_signal'
    )
    
    SELECT
        mobile_id,
        MIN(time) AS start_time,
        MAX(time) AS end_time,
        EXTRACT(epoch FROM MAX(time::timestamp) - MIN(time::timestamp)) / 60 diff_time,
        region
    FROM cte3
    GROUP BY
        mobile_id,
        region,
        (rn1 - rn2)
    ORDER BY
        mobile_id,
        start_time;
    

    Demo

    【讨论】:

      【解决方案2】:

      demo: db<>fiddle

      SELECT DISTINCT
          mobile_id,
          first_value(time) over (partition by ranked, time) as start_time,        -- B
          first_value(time) over (partition by ranked, time desc) as end_time, 
          region
      FROM
      (
          SELECT *, SUM(is_diff) OVER (ORDER BY time) as ranked                          -- A
          FROM
          (
              SELECT *,
                  CASE WHEN state = lag(state) over (order by time) THEN 0 ELSE 1 END as is_diff
              FROM test 
          ) s
      ) s
      WHERE
          state = 'no_signal';
      

      答:问题是您试图对一列排序,然后又想为另一列进行分区。这个问题可以通过这个子查询来解决。该问题已讨论here。我正在寻找更好的解决方案,但这个子查询有效。这将创建一个可用于您想要的窗口的列。

      B:创建窗口后,您的start_timeend_time 可以使用first_value(time)first_value(time) ... ORDER BY time DESC 函数轻松计算。 DESC,因为它会根据最新时间对窗口进行排序,然后您可以获得该窗口的第一个值 (last_value() does not work as expected every time)。


      为了更清楚地了解真正的问题,我省略了上面的diff 计算:要添加diff,您只需执行子查询:

      SELECT 
          *,  
          EXTRACT(epoch from (end_time - start_time)) / 60 as diff
      FROM (
          -- <QUERY ABOVE>
      ) s
      

      【讨论】:

        猜你喜欢
        • 2016-08-31
        • 2023-02-25
        • 1970-01-01
        • 2021-01-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多