【发布时间】:2023-01-27 00:31:04
【问题描述】:
我在 BQ 中有一个时间序列,带有附加数据,并且基于我想从时间序列中提取序列以进行进一步处理的一些数据。
下面演示源表:
with dataset as (
select
timestamp('2023-01-25 00:00:00') as last_seen, 1 as vehicle_id, 1 as mode, 0 as activity
union all select timestamp('2023-01-25 00:00:02'), 1, 1, 0
union all select timestamp('2023-01-25 00:00:04'), 1, 1, 0
union all select timestamp('2023-01-25 00:00:00'), 2, 1, 0
union all select timestamp('2023-01-25 00:00:02'), 2, 1, 0
union all select timestamp('2023-01-25 00:00:04'), 2, 1, 0
union all select timestamp('2023-01-25 00:00:06'), 1, 2, 1
union all select timestamp('2023-01-25 00:00:08'), 1, 2, 1
union all select timestamp('2023-01-25 00:00:10'), 1, 2, 1
union all select timestamp('2023-01-25 00:00:12'), 1, 1, 0
union all select timestamp('2023-01-25 00:00:14'), 1, 1, 0
union all select timestamp('2023-01-25 00:00:16'), 1, 1, 0
union all select timestamp('2023-01-25 00:00:12'), 2, 1, 1
union all select timestamp('2023-01-25 00:00:14'), 2, 1, 1
union all select timestamp('2023-01-25 00:00:17'), 2, 1, 1
)
我想要的是每次模式和/或活动更改时每个 vehicle_id 的结果,其中包括开始和结束时间戳。例如像这样:
| vehicle_id | mode | activity | start | end |
|---|---|---|---|---|
| 1 | 1 | 0 | 2023-01-25 00:00:00 | 2023-01-25 00:00:04 |
| 1 | 2 | 1 | 2023-01-25 00:00:06 | 2023-01-25 00:00:10 |
| 1 | 1 | 0 | 2023-01-25 00:00:12 | 2023-01-25 00:00:16 |
| 2 | 1 | 0 | 2023-01-25 00:00:00 | 2023-01-25 00:00:04 |
| 2 | 1 | 1 | 2023-01-25 00:00:12 | 2023-01-25 00:00:17 |
我努力了:
select * from dataset where true
qualify ifnull(mode != lag(mode) over win or activity != lag(activity) over win or mode != lead(mode) over win or activity != lead(activity) over win, true)
window win as (partition by vehicle_id order by last_seen)
但这在不同的行上给出了开始和结束,所以感觉就像是死胡同,因为如果序列没有结束,它可能会导致问题。
谢谢
【问题讨论】:
标签: sql google-bigquery time-series