偏移读数偏移的时间序列答案

【问题标题】：Shift time series where readings are offset偏移读数偏移的时间序列
【发布时间】：2021-04-07 19:15:00
【问题描述】：

我想从整点开始每 15 分钟获取一次读数，给定一组每小时读取但与每小时偏移分钟的读数。

我的第一种方法是使用 resample 到 15 分钟，但我没有得到预期的结果：

因此，如果读数是整点的，重新采样就可以正常工作：

left_key =  pd.to_datetime(['2020-12-01 00:00',
               '2020-12-01 01:00',
               '2020-12-01 02:00',
               '2020-12-01 03:00',
               '2020-12-01 04:00',
               '2020-12-01 05:00'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

满足我的需要：

2020-12-01 00:00:00    12.000000
2020-12-01 00:15:00    11.777455
2020-12-01 00:30:00    12.079464
2020-12-01 00:45:00    12.370313
2020-12-01 01:00:00    12.000000
2020-12-01 01:15:00    12.918527
2020-12-01 01:30:00    13.175893

但如果读数偏离小时：

left_key =  pd.to_datetime(['2020-12-01 00:06',
               '2020-12-01 01:06',
               '2020-12-01 02:06',
               '2020-12-01 03:06',
               '2020-12-01 04:06',
               '2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

现在我没有数据

2020-12-01 00:00:00   NaN
2020-12-01 00:15:00   NaN
2020-12-01 00:30:00   NaN
2020-12-01 00:45:00   NaN
2020-12-01 01:00:00   NaN

如果我每小时重新采样一次，它只会将读数向后移动

resampled = left_data.resample('H')
resampled.interpolate(method='spline', order=2)


2020-12-01 00:00:00    12
2020-12-01 01:00:00    12
2020-12-01 02:00:00    13
2020-12-01 03:00:00    15
2020-12-01 04:00:00    16
2020-12-01 05:00:00    15

有没有办法让重新采样来插入读数，这样我就可以得到正确的小时值？（这个问题有更好的标题吗！）

更新

虽然这些解决方案有效，但它不适合处理大量数据。 1000 行对我的机器来说太多了！即使减少初始重采样大小也需要大量内存和时间来完成。

这是这个问题的另一个解决方案：Interpolate one time series onto custom time series

# create a new index for the ranges of datetimes required
starts = df.index.min()
starts = datetime(starts.year, starts.month, starts.day, starts.hour,15*(starts.minute // 15))
master = pd.date_range(starts, df.index.max(), freq="15min")

# will need this to identify original data rows later
df['tag'] = True

# merge with original data and interpolate missing rows
idx = df.index.union(master)
df2 = df.reindex(idx).interpolate('index')

# now remove the things we don't want
df2.drop(df2.index[0], inplace=True)  # first value will be NaN (unless has real data)
# use the tag column to remove the original data and then drop that column
df2 = df2[df2['tag'].isna()]
df2.drop(columns=['tag',], inplace=True)

这要快得多！

【问题讨论】：

标签： pandas time-series

【解决方案1】：

好的。这不是所有解决方案中最漂亮的，但它在过去对我有用。这是一个技巧，包括在应用您想要的之前以可忽略的时间间隔重新采样两次。首先，您需要按时（日期）设置索引。

left_key =  pd.to_datetime(['2020-12-01 00:06',
               '2020-12-01 01:06',
               '2020-12-01 02:06',
               '2020-12-01 03:06',
               '2020-12-01 04:06',
               '2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15])

df = pd.DataFrame({'Dates':left_key , 'Values':left_data})
df.set_index('Dates', inplace=True)    
df1 = df.resample('1ms').interpolate(method='spline', order=2).resample('15min').first()

给了

                        Values
Dates                         
2020-12-01 00:00:00  12.000000
2020-12-01 00:15:00  11.653527
2020-12-01 00:30:00  11.960000
2020-12-01 00:45:00  12.255313
2020-12-01 01:00:00  12.539464
2020-12-01 01:15:00  12.812455
2020-12-01 01:30:00  13.074286
2020-12-01 01:45:00  13.324955
2020-12-01 02:00:00  13.564464
2020-12-01 02:15:00  13.792813
2020-12-01 02:30:00  14.010000
2020-12-01 02:45:00  14.216027
2020-12-01 03:00:00  14.410893
2020-12-01 03:15:00  14.594598
2020-12-01 03:30:00  14.767143
2020-12-01 03:45:00  14.928527
2020-12-01 04:00:00  15.078750
2020-12-01 04:15:00  15.217812
2020-12-01 04:30:00  15.345714
2020-12-01 04:45:00  15.462455
2020-12-01 05:00:00  15.568036

然后，与原始 df 连接

frames = [df, df1]
df2 = pd.concat(frames)
df2.sort_values('Dates')

                        Values
Dates                         
2020-12-01 00:00:00  12.000000
2020-12-01 00:06:00  12.000000
2020-12-01 00:15:00  11.653527
2020-12-01 00:30:00  11.960000
2020-12-01 00:45:00  12.255313
2020-12-01 01:00:00  12.539464
2020-12-01 01:06:00  12.000000
2020-12-01 01:15:00  12.812455
2020-12-01 01:30:00  13.074286
2020-12-01 01:45:00  13.324955
2020-12-01 02:00:00  13.564464
2020-12-01 02:06:00  13.000000
2020-12-01 02:15:00  13.792813
2020-12-01 02:30:00  14.010000
2020-12-01 02:45:00  14.216027
2020-12-01 03:00:00  14.410893
2020-12-01 03:06:00  15.000000
2020-12-01 03:15:00  14.594598
2020-12-01 03:30:00  14.767143
2020-12-01 03:45:00  14.928527
2020-12-01 04:00:00  15.078750
2020-12-01 04:06:00  16.000000
2020-12-01 04:15:00  15.217812
2020-12-01 04:30:00  15.345714
2020-12-01 04:45:00  15.462455
2020-12-01 05:00:00  15.568036
2020-12-01 05:06:00  15.000000

【讨论】：

不错！谢谢。