如何有效地对系列中的每小时平均值进行上采样？答案

【问题标题】：How to efficienty up-sample hourly averages in a Series?如何有效地对系列中的每小时平均值进行上采样？
【发布时间】：2021-02-25 17:45:41
【问题描述】：

我查看了 pandas 的“类似 sql”的 windows 功能，以及“滚动”。但是，在我看来，我不能对索引中的时间戳设置条件，但也许我错了。到目前为止，我一直在编写这个非常低效的代码，以将每小时平均值作为窗口函数。有人知道更快更好的方法吗？

def avg_on_hour(data: pd.Series()):
    new_series = pd.Series()
    start_date = data.index.min()
    end_date = data.index.max()
    delta = dt.timedelta(hours=1)
    this_time = start_date
    while this_time < end_date:
        this_date = this_time.date()
        this_hour = this_time.hour
        day_slice = data[(data.index.date == this_date) & (data.index.hour == this_hour)]
        day_avg = day_slice.mean()
        day_slice.iloc[:] = day_avg
        new_series = new_series.append(day_slice, verify_integrity=True)
        this_time = this_time + delta
    return new_series

示例：

【问题讨论】：

Pandas 在日期时间上滚动，因为该系列是日期时间索引的：series.rolling('1H').mean()。

标签： pandas time-series pandas-groupby

【解决方案1】：

Pandas 在日期时间上滚动，因为该系列是日期时间索引的

# sample data:
np.random.seed(1)
size=10
s = pd.Series(np.random.rand(size), 
              index=pd.date_range('2020-01-01', freq='7T', periods=size))

# rolling mean
series.rolling('1H').mean()

输出：

2020-01-01 00:00:00    0.417022
2020-01-01 00:07:00    0.568673
2020-01-01 00:14:00    0.379154
2020-01-01 00:21:00    0.359948
2020-01-01 00:28:00    0.317310
2020-01-01 00:35:00    0.279815
2020-01-01 00:42:00    0.266450
2020-01-01 00:49:00    0.276339
2020-01-01 00:56:00    0.289720
2020-01-01 01:03:00    0.303252
Freq: 7T, dtype: float64

更新：从您的评论看来，您正在寻找groupby：

s.groupby(s.index.floor('H')).transform('mean')

或

s.groupby(pd.Grouper(freq='H')).transform('mean')

输出：

2020-01-01 00:00:00    0.289720
2020-01-01 00:07:00    0.289720
2020-01-01 00:14:00    0.289720
2020-01-01 00:21:00    0.289720
2020-01-01 00:28:00    0.289720
2020-01-01 00:35:00    0.289720
2020-01-01 00:42:00    0.289720
2020-01-01 00:49:00    0.289720
2020-01-01 00:56:00    0.289720
2020-01-01 01:03:00    0.538817
Freq: 7T, dtype: float64

【讨论】：

是的，可能我表达的不太对。我希望在引用一天中同一小时的所有时间戳上具有相同值的系列。因此，它就像一个聚合函数，但随后再次上采样到原始间隔。例如，如果我发现 2020-08-01 11:00:00 和 2020-08-01 12:00:00 之间系列的平均值是（比如说）500，我想要一列我得到 500在该时间间隔内的每个时间戳。我在答案中添加了示例图片。
这不是滚动的。那只是groupby。查看更新的答案。
是的，太棒了。我不知道'transform'和'pd.Grouper'的用法