【问题标题】:Generate time series with daily statistics using a PostgreSQL query使用 PostgreSQL 查询生成具有每日统计信息的时间序列
【发布时间】:2017-08-26 09:44:28
【问题描述】:

我发现自己不得不制定一个(对我而言)相当复杂的 SQL 查询,但我似乎无法理解它。

我有一个名为 orders 的表和一个相关的表 order_state_history 记录了这些订单随时间的状态(见下文)。

我现在需要生成一系列行 - 每天一行 - 包含当天结束时处于特定状态的订单数量(请参阅 report)。另外我想只考虑order.type = 1 的订单。

数据位于 PostgreSQL 数据库中。我已经了解了如何使用 GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days 生成时间序列,它允许我生成没有记录状态更改的日期的行。

我目前的方法是将ordersorder_state_history 和生成的days 系列一起加入并尝试过滤掉所有具有DATE(order_state_history.timestamp) > DATE(days) 的行,然后以某种方式获取每个订单的最终状态那天first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC),但这就是我一点点 SQL 经验抛弃我的地方。

我只是无法解决这个问题。

这甚至可以在单个查询中解决,还是建议我通过某种每天执行一个查询的智能脚本来计算数据? 解决这个问题的合理方法是什么?

orders===            
id       type        
10000    1        
10001    1        
10002    2        
10003    2        
10004    1        


order_state_history===            
order_id    index    timestamp           new_state
10000       1        01.01.2001 12:00    NEW
10000       2        02.01.2001 13:00    ACTIVE
10000       3        03.01.2001 14:00    DONE
10001       1        02.01.2001 13:00    NEW
10002       1        03.01.2001 14:00    NEW
10002       2        05.01.2001 10:00    ACTIVE
10002       3        05.01.2001 14:00    DONE
10003       1        07.01.2001 04:00    NEW
10004       1        05.01.2001 14:00    NEW
10004       2        10.01.2001 17:30    DONE


Expected result===            
date          new_orders    active_orders    done_orders
01.01.2001    1             0                0
02.01.2001    1             1                0
03.01.2001    1             0                1
04.01.2001    1             0                1
05.01.2001    2             0                1
06.01.2001    2             0                1
07.01.2001    2             0                1
08.01.2001    2             0                1
09.01.2001    2             0                1
10.01.2001    1             0                2

【问题讨论】:

  • 请检查预期结果(为什么 03.01 有 2 个新订单?)并至少在 05.01 之前添加下一个预期行。
  • 我添加了所有相关行。在 03.01。有两个新订单,因为都是 02.01。和 03.01。创建了新订单(10001 和 10002)。订单 10001 停留处于 NEW 状态,因此将在接下来的所有天数中计数。计数是总数,结果行 new_orders 计算当天结束时处于 NEW 状态的所有订单,无论它们的状态是否改变。
  • 但是 10002 属于第 2 类所以不应该计算?
  • 你当然是对的。我已经相应地更新了数据。

标签: sql postgresql time-series


【解决方案1】:

步骤 1. 计算每个订单的累积状态总和,使用值 NEW = 1、ACTIVE = 1、DONE = 2:

select 
    order_id, timestamp::date as day, 
    sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)

 order_id |    day     | state 
----------+------------+-------
    10000 | 2001-01-01 |     1
    10000 | 2001-01-02 |     2
    10000 | 2001-01-03 |     4
    10001 | 2001-01-02 |     1
    10004 | 2001-01-05 |     1
    10004 | 2001-01-10 |     3
(6 rows)

步骤 2. 根据步骤 1 的状态计算每个订单的转移矩阵(2 表示 NEW->ACTIVE,3 表示 NEW->DONE,4 表示 ACTIVE->DONE):

select 
    order_id, day, state,
    case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
    case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
    case when state > 2 then 1 else 0 end as done
from (
    select 
        order_id, timestamp::date as day, 
        sum(case new_state when 'DONE' then 2 else 1 end) over w as state
    from order_state_history h
    join orders o on o.id = h.order_id
    where o.type = 1
    window w as (partition by order_id order by timestamp)
    ) s

 order_id |    day     | state | new | active | done 
----------+------------+-------+-----+--------+------
    10000 | 2001-01-01 |     1 |   1 |      0 |    0
    10000 | 2001-01-02 |     2 |  -1 |      1 |    0
    10000 | 2001-01-03 |     4 |   0 |     -1 |    1
    10001 | 2001-01-02 |     1 |   1 |      0 |    0
    10004 | 2001-01-05 |     1 |   1 |      0 |    0
    10004 | 2001-01-10 |     3 |  -1 |      0 |    1
(6 rows)

步骤 3. 计算一系列天的每个状态的累积总和:

select distinct
    day::date,
    sum(new) over w as new,
    sum(active) over w as active,
    sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
    select 
        order_id, day, state,
        case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
        case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
        case when state > 2 then 1 else 0 end as done
    from (
        select 
            order_id, timestamp::date as day, 
            sum(case new_state when 'DONE' then 2 else 1 end) over w as state
        from order_state_history h
        join orders o on o.id = h.order_id
        where o.type = 1
        window w as (partition by order_id order by timestamp)
        ) s
    ) s
using(day)
window w as (order by day)
order by 1

    day     | new | active | done 
------------+-----+--------+------
 2001-01-01 |   1 |      0 |    0
 2001-01-02 |   1 |      1 |    0
 2001-01-03 |   1 |      0 |    1
 2001-01-04 |   1 |      0 |    1
 2001-01-05 |   2 |      0 |    1
 2001-01-06 |   2 |      0 |    1
 2001-01-07 |   2 |      0 |    1
 2001-01-08 |   2 |      0 |    1
 2001-01-09 |   2 |      0 |    1
 2001-01-10 |   1 |      0 |    2
(10 rows)   

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-05-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-10-18
    相关资源
    最近更新 更多