非固定滚动窗口答案

【问题标题】：non fixed rolling window非固定滚动窗口
【发布时间】：2021-01-26 23:22:34
【问题描述】：

我希望在列表上实现滚动窗口，但不是固定长度的窗口，而是提供滚动窗口列表：
像这样的：

l1 = [5, 3, 8, 2, 10, 12, 13, 15, 22, 28]
l2 = [1, 2, 2, 2, 3, 4, 2, 3, 5, 3]
get_custom_roling( l1, l2, np.average)

结果是：

[5, 4, 5.5, 5, 6.67, ....]

6.67 是 3 个元素 10、2、8 的平均值。

我实现了一个缓慢的解决方案，欢迎提出每一个想法来让它更快:)：

import numpy as np



def get_the_list(end_point, number_points):
   """ 
   example: get_the_list(6, 3) ==> [4, 5, 6]
   example: get_the_list(9, 5) ==> [5, 6, 7, 8, 9]
   """
    if np.isnan(number_points):
        return []
    number_points = int( number_points)
    return list(range(end_point, end_point - number_points, -1  ))

def get_idx(s):
    ss = list(enumerate(s) )
    sss = (get_the_list(*elem)  for elem in ss  )
    return sss

def get_custom_roling(s, ss, funct):
    output_get_idx = get_idx(ss)
    agg_stuff = [s[elem] for elem in output_get_idx]
    res_agg_stuff = [ funct(elem) for elem in agg_stuff   ]
    res_agg_stuff = eiu.pd.Series(data=res_agg_stuff, index = s.index)
    return res_agg_stuff

【问题讨论】：

您如何知道要使用列表 1 中的哪些元素？
List2 给出了要使用的元素数量，比如 list2 的第 5 个元素是 3，所以返回列表的第 5 个元素将是：list1 [10, 2, 8] 中子列表/窗口的平均值.与pd.Series.rolling(windows=...) 相同的行为，windows 是一个函数而不是一个固定的数字

标签： python pandas rolling-computation

【解决方案1】：

Pandas custom window rolling 允许您修改窗口大小。

简单的解释：start 和 end 数组保存索引值以对数据进行切片。

#start = [0  0  1  2  2  2  5  5  4  7]
#end =   [1  2  3  4  5  6  7  8  9 10]

传递给get_window_bounds 的参数由 BaseIndexer 给出。

import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple


class CustomIndexer(BaseIndexer):

    def get_window_bounds(self,
                          num_values: int = 0,
                          min_periods: Optional[int] = None,
                          center: Optional[bool] = None,
                          closed: Optional[str] = None
                          ) -> Tuple[np.ndarray, np.ndarray]:

        end = np.arange(1, num_values+1, dtype=np.int64)
        start = end - np.array(self.custom_name_whatever, dtype=np.int64)
        return start, end

df = pd.DataFrame({"l1": [5, 3, 8, 2, 10, 12, 13, 15, 22, 28],
                   "l2": [1, 2, 2, 2,  3,  4,  2,  3,  5,  3]})

indexer = CustomIndexer(custom_name_whatever=df.l2)

df["variable_mean"] = df.l1.rolling(indexer).mean()

print(df)

输出：

   l1  l2  variable_mean
0   5   1       5.000000
1   3   2       4.000000
2   8   2       5.500000
3   2   2       5.000000
4  10   3       6.666667
5  12   4       8.000000
6  13   2      12.500000
7  15   3      13.333333
8  22   5      14.400000
9  28   3      21.666667

【讨论】：

对于 43k 行的数据帧，我从 120 秒到 300 毫秒！非常感谢。