如何计算高于/低于阈值的时间序列中的连续周期？答案

【问题标题】：How to count consecutive periods in a timeseries above/below threshold?如何计算高于/低于阈值的时间序列中的连续周期？
【发布时间】：2019-11-13 11:19:57
【问题描述】：

我有一个为期一年的值数据集，我想检测和计算高于/低于预先指定阈值的连续值的周期。我想简单地返回每个连续高于/低于阈值的时间段的长度。我在网上找到的代码几乎完全符合我的要求（如下所示，标题为“fire_season_length”的函数），但它无法返回数据集结束前的最后一个连续时间段（在年底）。

我认为这个问题是因为只有当一系列值从高于（低于）阈值翻转到低于（高于）阈值时才会报告一段时间的连续值。

这是我用来计算连续高于/低于阈值周期的函数：

def fire_season_length(ts, threshold):

    ntot_ts = ts.count() #total number of values in ts (timeseries)
    n_gt_threshold = ts[ts >= threshold].count() #number of values greater than threshold

    type_day = 0 #below threshold
    type_day = 1 #meets or exceeds threshold

    type_prev_day = 0 #initialize first day 
    storage_n_cons_days = [[],[]]   #[[cons days above threshold], [cons days below threshold]]
    n_cons_days = 0

    for cur_day in ts: #current day in timeseries

        if cur_day >= threshold:
            type_cur_day = 1
            if type_cur_day == type_prev_day: #if same as current day
                n_cons_days += 1
            else: #if not same as current day
                storage_n_cons_days[1].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day
        else:
            type_cur_day = 0
            if type_cur_day == type_prev_day:
                n_cons_days += 1
            else:
                storage_n_cons_days[0].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day



    return ntot_ts, n_gt_threshold, storage_n_cons_days

这是我通过函数运行时间序列时的输出；我已经对绘图进行了注释以显示有 7 个周期的连续值，但返回的数组 [[13,185,30], [24, 78, 12]] （表示 [[高于阈值的周期] ,[periods below threshold]]) 只列出了六个这样的周期。输出中似乎没有报告周期 7，这与我在此函数中测试的其他时间序列的输出也是一致的。See annotated plot here

所以我的问题是：如何让我的代码返回连续值的最后一个周期，即使一系列值没有翻转为另一个符号（高于/低于阈值）？强>

【问题讨论】：

标签： python for-loop count

【解决方案1】：

你可以使用accumulate()和Counter()的组合来做到这一点：

import random
from itertools import accumulate
from collections import Counter

ts = [ random.randint(1,100) for _ in range(15) ]

treshold = 50
groups = accumulate([0]+[(a>=treshold) != (b>=treshold) for a,b in zip(ts,ts[1:])])
counts = sorted(Counter(groups).items())
above  = [ c for n,c in counts if (n%2==0) == (ts[0]>=treshold) ]
below  = [ c for n,c in counts if (n%2==0) != (ts[0]>=treshold) ]

print("data ",ts)
print("above",above)
print("below",below)

示例输出：

data  [99, 49, 84, 69, 27, 88, 35, 43, 3, 48, 80, 14, 32, 97, 78]
above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]

其工作方式如下：

首先确定发生上下变化的位置。
状态变化由 True (1) 标识，不变的位置为 False (0)。
这些 1 和 0 的累积总和将产生一系列不同的变化值，这些值对于没有状态变化的位置重复这些值。
然后使用 Counter 类来计算每个重复值出现的次数。这对应于按不同状态变化细分的连续状态数。
对计数器进行排序可恢复状态更改的时间顺序。
根据第一项的状态，偶数值将全部对应于上方或下方状态，奇数值将对应于相反状态。

[编辑] 更直接的方法是使用 groupby 键控温度高于（真）或低于（假）阈值：

from itertools import groupby

threshold = 50
changes = [ (c,len([*g])) for c,g in groupby(ts,lambda t:(t>=threshold))]

print('above:',[n for above,n in changes if above])
print('below:',[n for above,n in changes if not above])

above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]

【讨论】：