【问题标题】:PostgreSQL: aggregate records by time intervalPostgreSQL:按时间间隔聚合记录
【发布时间】:2020-06-22 16:13:11
【问题描述】:

我想按出行方式生成 GPS 捕获率报告。

我在表格modes 中有用户使用的出行方式类型。

CREATE TABLE modes
(
    user_id integer NOT NULL,
    trip_id int,
    start_time timestamp with time zone NOT NULL,
    end_time timestamp with time zone NOT NULL,
    travelmode text ,
    PRIMARY KEY (user_id, start_time, end_time)
)

例如,以下是user 10针对不同行程的出行模式示例数据。

INSERT INTO modes (user_id, trip_id, start_time, end_time, travelmode)
VALUES (10,1000,'2008-06-18 13:28:18+01','2008-06-18 13:32:20+01','bus'),
      (10,1001,'2008-06-18 14:47:35+01','2008-06-18 15:05:31+01','bus'),
      (10,1002,'2008-08-01 02:51:47+01','2008-08-01 03:37:43+01','metro'),
      (10,1003,'2008-08-01 03:59:36+01','2008-08-01 04:30:20+01','metro'),
      (10,1004,'2008-08-01 05:20:07+01','2008-08-01 07:03:51+01','car'),
      (10,1005,'2008-08-01 07:17:08+01','2008-08-01 08:06:26+01','bus'),
      (10,1006,'2008-09-15 23:54:20+01','2008-09-16 00:02:44+01','bus'),
      (10,1007,'2008-09-16 00:10:22+01','2008-09-16 00:28:29+01','bus'),
      (10,1008,'2008-09-16 00:58:43+01','2008-09-16 01:07:14+01','metro')

然后对于每个用户和每个行程,用户的GPS traces 记录在一个表中plt_distinct

CREATE TABLE plt_distinct
(
    user_id int,
    trip_id int, 
    logtime timestamp with time zone NOT NULL,
    lat double precision NOT NULL,
    lon double precision NOT NULL,
    alt double precision,
    PRIMARY KEY (trip_id, logtime)
)

同样,对于上面示例数据中给出的用户,以下是特定旅行的示例GPS traces

INSERT INTO plt_distinct (user_id, trip_id, logtime, lat, lon, alt)
VALUES (10,1002,'2008-06-18 04:46:20+01',39.940474,116.346754,233),
      (10,1002,'2008-06-18 04:46:21+01',39.940491,116.346745,233),
      (10,1002,'2008-06-18 04:46:23+01',39.940526,116.346734,233),
      (10,1002,'2008-06-18 04:46:25+01',39.940573,116.346725,233),
      (10,1002,'2008-06-18 04:46:31+01',39.940815,116.346688,230),
      (10,1002,'2008-06-18 04:46:32+01',39.940861,116.346661,230),
      (10,1002,'2008-06-18 04:46:33+01',39.940941,116.346599,233),
      (10,1002,'2008-06-18 04:46:35+01',39.941109,116.34658,233),
      (10,1002,'2008-06-18 04:46:39+01',39.941464,116.346561,240),
      (10,1002,'2008-06-18 04:46:40+01',39.941558,116.346521,246),
      (10,1002,'2008-06-18 04:46:42+01',39.941816,116.346438,259)

给定的样本是metro mode 的跟踪。出于分析目的,我有兴趣汇总每种模式的 GPS traces 区间(尤其是 metro,因为 GPS 在地下不可用)。

我在此DB-fiddle 中提供这些表格和示例数据。

预期的结果是这样的:

+-----------------------+---------------+-----------------+-----------------+----------------+
| count of metro(total) | interval (1s) | interval (2-5s) | interval(6-10s) | interval(>10s) |
+-----------------------+---------------+-----------------+-----------------+----------------+
|                    10 |             4 |               5 |               1 |              0 |
+-----------------------+---------------+-----------------+-----------------+----------------+

【问题讨论】:

    标签: sql postgresql aggregate-functions


    【解决方案1】:

    您应该使用LAG()Filter 子句来实现这一点:

    试试这个:

    select 
    count(*) filter (where time_>0) "count of metro(total)",
    count(*) filter (where time_=1) "interval (1s)",
    count(*) filter (where time_ between 2 and 5) "interval (2-5s)",
    count(*) filter (where time_ between 6 and 10) "interval(6-10s)",
    count(*) filter (where time_ >10) "interval(>10s)"
    from 
    (
    select 
    coalesce(extract (epoch from (t1.logtime- lag(t1.logtime) over (partition by t1.trip_id order by t1.trip_id, t1.logtime))),0) as "time_"
                                                           
    from plt_distinct t1 
    
    inner join modes t2 on t1.user_id=t2.user_id and t1.trip_id=t2.trip_id 
    where t2.travelmode='metro'
    ) t                             
    

    DEMO of Fiddle

    【讨论】:

      【解决方案2】:
      with ordered_plt as (
       select
        p.*
        ,row_number() over (partition by p.trip_id order by p.logtime) as rn
       from plt_distinct as p
       inner join modes
       on p.trip_id = modes.trip_id
       where modes.travelmode = 'metro'
      ), plt_with_prev as (
       select
        ord.*
        ,prev.logtime as prev_logtime
       from ordered_plt as ord
       inner join ordered_plt as prev
       on ord.trip_id = prev.trip_id and ord.rn-1 = prev.rn
      ), elapsed as (
       select
        extract(second from (pwp.logtime - pwp.prev_logtime)) as sec
       from plt_with_prev as pwp
      )
      select
       count(*) as total
       ,sum(case when sec <= 1 then 1 else 0 end) as zero_to_1
       ,sum(case when 1 < sec and sec <= 5 then 1 else 0 end) as one_to_5
       ,sum(case when 5 < sec and sec <= 10 then 1 else 0 end) as five_to_10
       ,sum(case when 10 < sec then 1 else 0 end) as ten_to_inf
      from elapsed
      

      db<>fiddle

      如果您将列prev_logtime 添加到表plt_distinct,事情会容易得多。

      【讨论】:

        猜你喜欢
        • 2015-11-06
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-02-28
        • 2012-04-14
        • 1970-01-01
        • 2016-02-01
        • 1970-01-01
        相关资源
        最近更新 更多