将 pandas 时间序列重新采样到预定义的网格答案

【问题标题】：Resampling pandas time series to a predefined grid将 pandas 时间序列重新采样到预定义的网格
【发布时间】：2017-03-22 21:24:07
【问题描述】：

假设我有一个这样构建的每周时间序列：

rng = pd.date_range('1/1/2011', periods=72, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
weekly = ts.resample('W').mean()

您还有另一个系列，每天间隔，您也想每周汇总一次，但要与第一个相匹配。

rng2 = pd.date_range('17/1/2011', periods=72, freq='D')
ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2)

请注意，第二个系列不会在同一日期开始，因此只需重新采样 ts2 就会使两个每周系列错位。如果 resample 可以接收到一个 detetime 索引来重新采样，那就太好了，但 AFAICT 这是不可能的。

你会怎么做？

【问题讨论】：

在您的示例中，您应该将 ts2 的索引更改为 rng2 （无法编辑自己，因为它会短于 6 个字符...

标签： python pandas time-series

【解决方案1】：

@FLab 答案是最好的 imo，如果您希望两个系列的索引完全相同，您也可以这样做：

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2011', periods=72, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
weekly = ts.resample('W').mean()

rng2 = pd.date_range('17/1/2011', periods=72, freq='D')
ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2)

ts2.reindex(ts.index).resample('W').mean()

Out[14]: 
2011-01-02         NaN
2011-01-09         NaN
2011-01-16         NaN
2011-01-23   -0.073253
2011-01-30   -0.065030
2011-02-06   -0.037297
2011-02-13    0.101782
2011-02-20   -0.386027
2011-02-27    0.131906
2011-03-06    0.107101
2011-03-13   -0.030496
Freq: W-SUN, dtype: float64

如果您无权访问先前的索引，只需使用 @FLab 方法，例如：

ts.resample('W-SUN').mean()
ts2.resample('W-SUN').mean()

你可以在这里传递多个参数：

Alias   Description
W-SUN   weekly frequency (sundays). Same as ‘W’
W-MON   weekly frequency (mondays)
W-TUE   weekly frequency (tuesdays)
W-WED   weekly frequency (wednesdays)
W-THU   weekly frequency (thursdays)
W-FRI   weekly frequency (fridays)
W-SAT   weekly frequency (saturdays)

【讨论】：

史蒂文，假设您无法访问原始 ts，只有原始的每周系列。
我刚刚测试过。如果您直接使用 week.index 重新索引，它会起作用。
当心：如果你直接用 week.index 重新索引代码仍然运行，但你会得到不同的（错误的）结果。 ts2.reindex(weekly.index) 包含对周日发生的 ts2 的观察。因此，当您重新采样时，您不会取平均值，而只是保留周日的值。
如果您无权访问先前的索引，只需使用 resample 并指定从哪一天开始：ts.resample('W-SUN').mean()

【解决方案2】：

当重采样为每周时，您还可以指定您的一周从哪一天开始：http://pandas.pydata.org/pandas-docs/stable/timeseries.html#anchored-offsets。

因此你可以这样做：

ts2_resamples = ts2.resample(weekly.index.freq).mean()

【讨论】：

如果每周系列是通过聚合创建的，则此方法有效。如果不存在，则索引的freq属性不存在。
是的，你是对的，但我认为这是示例的假设。如果在您的代码中您依赖于系列的频率，我认为最好明确地执行它。换句话说，即使您从 csv 阅读从周日开始的每周系列，我想我也会检查一下频率是否被 pandas 识别，否则我会明确设置自己。
你可能也对这个问题感兴趣，如果你需要设置一个特定的频率：stackoverflow.com/questions/27607974/…