【问题标题】:SQL: How to get change in value between two adjacent rows in a time series?SQL:如何获取时间序列中两个相邻行之间的值变化?
【发布时间】:2018-09-24 23:43:32
【问题描述】:

我有一个名为NEW_YORK_TEMPERATURES 的表,例如:

注意:添加间距以显示不同的位置和日期

datetime,           location, min_temp, max_temp

2018-01-01 12:00:00,  seneca,     76.1,     76.5
2018-01-01 12:10:00,  seneca,     76.1,     76.5
2018-01-01 12:20:00,  seneca,     76.2,     76.6
2018-01-01 12:30:00,  seneca,     76.1,     76.6
2018-01-01 12:40:00,  seneca,     76.1,     76.5

2018-01-02 12:00:00,  seneca,     76.6,     77.3
2018-01-02 12:10:00,  seneca,     76.6,     77.3
2018-01-02 12:20:00,  seneca,     76.6,     77.3
2018-01-02 12:30:00,  seneca,     76.6,     77.3
2018-01-02 12:40:00,  seneca,     76.6,     77.3

2018-01-01 12:00:00, conesus,     66.1,     66.5
2018-01-01 12:10:00, conesus,     66.1,     66.5
2018-01-01 12:20:00, conesus,     66.2,     66.6
2018-01-01 12:30:00, conesus,     66.1,     66.6
2018-01-01 12:40:00, conesus,     66.1,     66.5

2018-01-02 12:00:00, conesus,     66.4,     66.7
2018-01-02 12:10:00, conesus,     66.4,     66.8
2018-01-02 12:20:00, conesus,     66.4,     66.9
2018-01-02 12:30:00, conesus,     66.4,     66.9
2018-01-02 12:40:00, conesus,     66.4,     66.9   

2018-01-01 12:00:00, ontario,     63.1,     63.5
2018-01-01 12:10:00, ontario,     63.1,     63.5
2018-01-01 12:20:00, ontario,     63.2,     63.6
2018-01-01 12:30:00, ontario,     63.1,     63.6
2018-01-01 12:40:00, ontario,     63.1,     63.5

2018-01-02 12:00:00, ontario,     63.3,     63.8
2018-01-02 12:10:00, ontario,     63.3,     63.8    
2018-01-02 12:20:00, ontario,     63.3,     63.8
2018-01-02 12:30:00, ontario,     63.3,     63.7
2018-01-02 12:40:00, ontario,     63.3,     63.7

我需要一种方法来获取两个连续时间戳之间范围变化的差异。 第一步是通过执行以下操作创建展开列:

select 
    datetime, 
    location, 
    min_temp, 
    max_temp, 
    max_temp - min_temp as range 
from NEW_YORK_TEMPERATURES 
order by datetime

获取如下表格:

datetime,           location, min_temp, max_temp, range

2018-01-01 12:00:00,  seneca,     76.1,     76.5,   0.4
2018-01-01 12:10:00,  seneca,     76.1,     76.5,   0.4
2018-01-01 12:20:00,  seneca,     76.2,     76.6,   0.4
2018-01-01 12:30:00,  seneca,     76.1,     76.6,   0.5
2018-01-01 12:40:00,  seneca,     76.1,     76.5,   0.4

2018-01-02 12:00:00,  seneca,     76.6,     77.3,   0.7
2018-01-02 12:10:00,  seneca,     76.6,     77.3,   0.7
2018-01-02 12:20:00,  seneca,     76.6,     77.3,   0.7
2018-01-02 12:30:00,  seneca,     76.6,     77.3,   0.7
2018-01-02 12:40:00,  seneca,     76.6,     77.3,   0.7

2018-01-01 12:00:00, conesus,     66.1,     66.5,   0.4
2018-01-01 12:10:00, conesus,     66.1,     66.5,   0.4
2018-01-01 12:20:00, conesus,     66.2,     66.6,   0.4
2018-01-01 12:30:00, conesus,     66.1,     66.6,   0.5
2018-01-01 12:40:00, conesus,     66.1,     66.5,   0.4

2018-01-02 12:00:00, conesus,     66.4,     66.7,   0.3
2018-01-02 12:10:00, conesus,     66.4,     66.8,   0.4
2018-01-02 12:20:00, conesus,     66.4,     66.9,   0.5
2018-01-02 12:30:00, conesus,     66.4,     66.9,   0.5
2018-01-02 12:40:00, conesus,     66.4,     66.9,   0.5

2018-01-01 12:00:00, ontario,     63.1,     63.5,   0.4
2018-01-01 12:10:00, ontario,     63.1,     63.5,   0.4
2018-01-01 12:20:00, ontario,     63.2,     63.6,   0.4
2018-01-01 12:30:00, ontario,     63.1,     63.6,   0.5
2018-01-01 12:40:00, ontario,     63.1,     63.5,   0.4

2018-01-02 12:00:00, ontario,     63.3,     63.8,   0.5
2018-01-02 12:10:00, ontario,     63.3,     63.8,   0.5   
2018-01-02 12:20:00, ontario,     63.3,     63.8,   0.5
2018-01-02 12:30:00, ontario,     63.3,     63.7,   0.4
2018-01-02 12:40:00, ontario,     63.3,     63.7,   0.4

但是我怎样才能在同一位置获得相邻条之间范围的变化,以便我的结果看起来像:

datetime,           location, min_temp, max_temp, range, change_in_range

2018-01-01 12:00:00,  seneca,     76.1,     76.5,   0.4              nan
2018-01-01 12:10:00,  seneca,     76.1,     76.5,   0.4              0.0
2018-01-01 12:20:00,  seneca,     76.2,     76.6,   0.4              0.0
2018-01-01 12:30:00,  seneca,     76.1,     76.6,   0.5              0.1
2018-01-01 12:40:00,  seneca,     76.1,     76.5,   0.4             -0.1

2018-01-02 12:00:00,  seneca,     76.6,     77.3,   0.7              0.0
2018-01-02 12:10:00,  seneca,     76.6,     77.3,   0.7              0.0
2018-01-02 12:20:00,  seneca,     76.6,     77.3,   0.7              0.0
2018-01-02 12:30:00,  seneca,     76.6,     77.3,   0.7              0.0
2018-01-02 12:40:00,  seneca,     76.6,     77.3,   0.7              0.0

2018-01-01 12:00:00, conesus,     66.1,     66.5,   0.4              nan
2018-01-01 12:10:00, conesus,     66.1,     66.5,   0.4              0.0
2018-01-01 12:20:00, conesus,     66.2,     66.6,   0.4              0.0
2018-01-01 12:30:00, conesus,     66.1,     66.6,   0.5              0.1
2018-01-01 12:40:00, conesus,     66.1,     66.5,   0.4              0.0

2018-01-02 12:00:00, conesus,     66.4,     66.7,   0.3             -0.1
2018-01-02 12:10:00, conesus,     66.4,     66.8,   0.4              0.1
2018-01-02 12:20:00, conesus,     66.4,     66.9,   0.5              0.1
2018-01-02 12:30:00, conesus,     66.4,     66.9,   0.5              0.0
2018-01-02 12:40:00, conesus,     66.4,     66.9,   0.5              0.0

2018-01-01 12:00:00, ontario,     63.1,     63.5,   0.4              nan
2018-01-01 12:10:00, ontario,     63.1,     63.5,   0.4              0.0
2018-01-01 12:20:00, ontario,     63.2,     63.6,   0.4              0.0
2018-01-01 12:30:00, ontario,     63.1,     63.6,   0.5              0.1
2018-01-01 12:40:00, ontario,     63.1,     63.5,   0.4             -0.1

2018-01-02 12:00:00, ontario,     63.3,     63.8,   0.5              0.1
2018-01-02 12:10:00, ontario,     63.3,     63.8,   0.5              0.0
2018-01-02 12:20:00, ontario,     63.3,     63.8,   0.5              0.0
2018-01-02 12:30:00, ontario,     63.3,     63.7,   0.4             -0.1
2018-01-02 12:40:00, ontario,     63.3,     63.7,   0.4              0.0

【问题讨论】:

  • “2018-01-02 12:00:00/seneca”的 0 没有意义。

标签: sql postgresql


【解决方案1】:

您可以将lag() 与横向连接一起使用:

select t.*, v.range,
       (range - lag(range) over (partition by location order by datetime)) as change_in_range
from NEW_YORK_TEMPERATURES t cross join lateral
     (values (max_temp - min_temp)) v(range)
order by location, datetime;

注意:这会将每个位置的第一个值表示为 NULL 而不是 nan

另外,对于您想要的输出,您需要order by location, datetime

【讨论】:

    【解决方案2】:

    一个选项使用LAG

    SELECT 
        datetime,
        location,
        min_temp,
        max_temp,
        max_temp - min_temp AS range,
        (max_temp - min_temp) - LAG(max_temp - min_temp) OVER
            (PARTITION BY location ORDER BY datetime) change_in_range
    FROM NEW_YORK_TEMPERATURES 
    ORDER BY
        location, datetime;
    

    Demo

    【讨论】:

      【解决方案3】:

      由于您似乎希望按日期(从 datetime 字段截断)位置对“范围差异”进行分区,因此您需要将截断的日期添加到您的LAG 函数。使用 CTE,您会得到如下结果:

          WITH temps_with_ranges AS
          (
              SELECT *, 
                  date_trunc('day', datetime) AS dt, 
                  max_temp - min_temp AS "range"
              FROM NEW_YORK_TEMPERATURES
          )
      SELECT * , "range" - LAG("range") OVER (PARTITION BY location, dt ORDER BY datetime)
      FROM temps_with_ranges
      

      【讨论】:

      • 这个答案和我昨天给出的有什么不同?
      • @TimBiegeleisen:您的没有按位置和日期划分。另外,我认为 CTE 让事情变得更简洁。
      • 仔细查看原始问题中的预期范围差异。没有按日期划分,只有按位置划分。
      • @TimBiegeleisen,我同意预期结果并未明确表明他想按日期分区,但 OP 在日期组和位置组之间也有视觉分离。因此,我认为至少将其作为一种选择是有价值的。此外,正如 Gordon Linoff 指出的那样,无论如何,预期的结果并不一致。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-05-25
      • 2014-04-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多