【问题标题】:Pandas sliding window over a dataframe columnPandas 在数据框列上滑动窗口
【发布时间】:2021-03-04 21:04:11
【问题描述】:

我是熊猫新手。我正在使用 pandas 将时间戳记录的 CSV 文件读入数据帧。数据有以下列:

时间戳 COLUMN_A COLUMN_B COLUMN_C

将数据读入数据框后,我希望能够在 COLUMN_C 上运行窗口函数;该函数应返回列的时间戳值。

我写了一些适用于可迭代对象的东西:

import collections
import itertools


def sliding_window_iter(iterable, size):
    """Iterate through iterable using a sliding window of several elements.

    Creates an iterable where each element is a tuple of `size`
    consecutive elements from `iterable`, advancing by 1 element each
    time. For example:

    >>> list(sliding_window_iter([1, 2, 3, 4], 2))
    [(1, 2), (2, 3), (3, 4)]
    """
    iterable = iter(iterable)
    window = collections.deque(
        itertools.islice(iterable, size-1),
        maxlen=size
    )
    for item in iterable:
        window.append(item)
        yield tuple(window)

如何修改它以在数据框的列上工作?

【问题讨论】:

标签: python pandas


【解决方案1】:

连续切片数据帧更简单。既然你想要重叠窗口[(1, 2), (2, 3), (3, 4), ...],你可以这样写:

def sliding_window_iter(series, size):
    """series is a column of a dataframe"""
    for start_row in range(len(series) - size + 1):
        yield series[start_row:start_row + size]

用法:

df = pd.DataFrame({'A': list(range(100, 501, 100)),
                   'B': list(range(-20, -15)),
                   'C': [0, 1, 2, None, 4]},
                  index=pd.date_range('2021-01-01', periods=5))

list(sliding_window_iter(df['C'], 2))

输出:

[2021-01-01    0.0
 2021-01-02    1.0
 Freq: D, Name: C, dtype: float64,
 2021-01-02    1.0
 2021-01-03    2.0
 Freq: D, Name: C, dtype: float64,
 2021-01-03    2.0
 2021-01-04    NaN
 Freq: D, Name: C, dtype: float64,
 2021-01-04    NaN
 2021-01-05    4.0
 Freq: D, Name: C, dtype: float64]

如果你传入多列也可以:

list(sliding_window_iter(df.loc[:, ['A', 'C']], 2))

#output:
[              A    C
 2021-01-01  100  0.0
 2021-01-02  200  1.0,
               A    C
 2021-01-02  200  1.0
 2021-01-03  300  2.0,
               A    C
 2021-01-03  300  2.0
 2021-01-04  400  NaN,
               A    C
 2021-01-04  400  NaN
 2021-01-05  500  4.0]

【讨论】:

    猜你喜欢
    • 2021-08-27
    • 2019-10-04
    • 1970-01-01
    • 2017-03-17
    • 2017-11-04
    • 2021-06-22
    • 1970-01-01
    • 2018-03-02
    • 2021-07-02
    相关资源
    最近更新 更多