【问题标题】:Bigquery extracting sequences from timeseries dataBigquery 从时间序列数据中提取序列
【发布时间】:2023-01-27 00:31:04
【问题描述】:

我在 BQ 中有一个时间序列,带有附加数据,并且基于我想从时间序列中提取序列以进行进一步处理的一些数据。

下面演示源表:

with dataset as (
 select
    timestamp('2023-01-25 00:00:00') as last_seen, 1 as vehicle_id, 1 as mode, 0 as activity 
    union all select timestamp('2023-01-25 00:00:02'), 1, 1, 0
    union all select timestamp('2023-01-25 00:00:04'), 1, 1, 0
    union all select timestamp('2023-01-25 00:00:00'), 2, 1, 0
    union all select timestamp('2023-01-25 00:00:02'), 2, 1, 0
    union all select timestamp('2023-01-25 00:00:04'), 2, 1, 0
    union all select timestamp('2023-01-25 00:00:06'), 1, 2, 1
    union all select timestamp('2023-01-25 00:00:08'), 1, 2, 1
    union all select timestamp('2023-01-25 00:00:10'), 1, 2, 1
    union all select timestamp('2023-01-25 00:00:12'), 1, 1, 0
    union all select timestamp('2023-01-25 00:00:14'), 1, 1, 0
    union all select timestamp('2023-01-25 00:00:16'), 1, 1, 0
    union all select timestamp('2023-01-25 00:00:12'), 2, 1, 1
    union all select timestamp('2023-01-25 00:00:14'), 2, 1, 1
    union all select timestamp('2023-01-25 00:00:17'), 2, 1, 1
)

我想要的是每次模式和/或活动更改时每个 vehicle_id 的结果,其中包括开始和结束时间戳。例如像这样:

vehicle_id mode activity start end
1 1 0 2023-01-25 00:00:00 2023-01-25 00:00:04
1 2 1 2023-01-25 00:00:06 2023-01-25 00:00:10
1 1 0 2023-01-25 00:00:12 2023-01-25 00:00:16
2 1 0 2023-01-25 00:00:00 2023-01-25 00:00:04
2 1 1 2023-01-25 00:00:12 2023-01-25 00:00:17

我努力了:

select * from dataset where true
qualify ifnull(mode != lag(mode) over win or activity != lag(activity) over win or mode != lead(mode) over win or activity != lead(activity) over win, true)
window win as (partition by vehicle_id order by last_seen)

但这在不同的行上给出了开始和结束,所以感觉就像是死胡同,因为如果序列没有结束,它可能会导致问题。

谢谢

【问题讨论】:

    标签: sql google-bigquery time-series


    【解决方案1】:

    您可以考虑以下。

    SELECT vehicle_id, 
           ANY_VALUE(mode) mode, ANY_VALUE(activity) activity,
           MIN(last_seen) AS start, MAX(last_seen) AS `end`
      FROM (
        SELECT *, COUNTIF(flag) OVER w1 AS part FROM (
          SELECT *, mode <> LAG(mode) OVER w0 OR activity <> LAG(activity) OVER w0 AS flag
            FROM dataset
          WINDOW w0 AS (PARTITION BY vehicle_id ORDER BY last_seen)
        ) WINDOW w1 AS (PARTITION BY vehicle_id ORDER BY last_seen)
      ) GROUP BY vehicle_id, part;
    

    查询结果

    【讨论】:

      猜你喜欢
      • 2023-03-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-16
      • 2020-01-31
      • 2021-07-15
      • 2015-07-01
      相关资源
      最近更新 更多