【问题标题】:Shift time series where readings are offset偏移读数偏移的时间序列
【发布时间】:2021-04-07 19:15:00
【问题描述】:

我想从整点开始每 15 分钟获取一次读数,给定一组每小时读取但与每小时偏移分钟的读数。

我的第一种方法是使用 resample 到 15 分钟,但我没有得到预期的结果:

因此,如果读数是整点的,重新采样就可以正常工作:

left_key =  pd.to_datetime(['2020-12-01 00:00',
               '2020-12-01 01:00',
               '2020-12-01 02:00',
               '2020-12-01 03:00',
               '2020-12-01 04:00',
               '2020-12-01 05:00'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

满足我的需要:

2020-12-01 00:00:00    12.000000
2020-12-01 00:15:00    11.777455
2020-12-01 00:30:00    12.079464
2020-12-01 00:45:00    12.370313
2020-12-01 01:00:00    12.000000
2020-12-01 01:15:00    12.918527
2020-12-01 01:30:00    13.175893

但如果读数偏离小时:

left_key =  pd.to_datetime(['2020-12-01 00:06',
               '2020-12-01 01:06',
               '2020-12-01 02:06',
               '2020-12-01 03:06',
               '2020-12-01 04:06',
               '2020-12-01 05:06'])
left_data = pd.Series([12,12,13,15,16,15], index=left_key, name='master')

resampled = left_data.resample('15min')
resampled.interpolate(method='spline', order=2)

现在我没有数据

2020-12-01 00:00:00   NaN
2020-12-01 00:15:00   NaN
2020-12-01 00:30:00   NaN
2020-12-01 00:45:00   NaN
2020-12-01 01:00:00   NaN

如果我每小时重新采样一次,它只会将读数向后移动

resampled = left_data.resample('H')
resampled.interpolate(method='spline', order=2)


2020-12-01 00:00:00    12
2020-12-01 01:00:00    12
2020-12-01 02:00:00    13
2020-12-01 03:00:00    15
2020-12-01 04:00:00    16
2020-12-01 05:00:00    15

有没有办法让重新采样来插入读数,这样我就可以得到正确的小时值? (这个问题有更好的标题吗!)

更新

虽然这些解决方案有效,但它不适合处理大量数据。 1000 行对我的机器来说太多了!即使减少初始重采样大小也需要大量内存和时间来完成。

这是这个问题的另一个解决方案:Interpolate one time series onto custom time series

# create a new index for the ranges of datetimes required
starts = df.index.min()
starts = datetime(starts.year, starts.month, starts.day, starts.hour,15*(starts.minute // 15))
master = pd.date_range(starts, df.index.max(), freq="15min")

# will need this to identify original data rows later
df['tag'] = True

# merge with original data and interpolate missing rows
idx = df.index.union(master)
df2 = df.reindex(idx).interpolate('index')

# now remove the things we don't want
df2.drop(df2.index[0], inplace=True)  # first value will be NaN (unless has real data)
# use the tag column to remove the original data and then drop that column
df2 = df2[df2['tag'].isna()]
df2.drop(columns=['tag',], inplace=True)

这要快得多!

【问题讨论】:

    标签: pandas time-series


    【解决方案1】:

    好的。这不是所有解决方案中最漂亮的,但它在过去对我有用。这是一个技巧,包括在应用您想要的之前以可忽略的时间间隔重新采样两次。首先,您需要按时(日期)设置索引。

    left_key =  pd.to_datetime(['2020-12-01 00:06',
                   '2020-12-01 01:06',
                   '2020-12-01 02:06',
                   '2020-12-01 03:06',
                   '2020-12-01 04:06',
                   '2020-12-01 05:06'])
    left_data = pd.Series([12,12,13,15,16,15])
    
    df = pd.DataFrame({'Dates':left_key , 'Values':left_data})
    df.set_index('Dates', inplace=True)    
    df1 = df.resample('1ms').interpolate(method='spline', order=2).resample('15min').first()
    

    给了

                            Values
    Dates                         
    2020-12-01 00:00:00  12.000000
    2020-12-01 00:15:00  11.653527
    2020-12-01 00:30:00  11.960000
    2020-12-01 00:45:00  12.255313
    2020-12-01 01:00:00  12.539464
    2020-12-01 01:15:00  12.812455
    2020-12-01 01:30:00  13.074286
    2020-12-01 01:45:00  13.324955
    2020-12-01 02:00:00  13.564464
    2020-12-01 02:15:00  13.792813
    2020-12-01 02:30:00  14.010000
    2020-12-01 02:45:00  14.216027
    2020-12-01 03:00:00  14.410893
    2020-12-01 03:15:00  14.594598
    2020-12-01 03:30:00  14.767143
    2020-12-01 03:45:00  14.928527
    2020-12-01 04:00:00  15.078750
    2020-12-01 04:15:00  15.217812
    2020-12-01 04:30:00  15.345714
    2020-12-01 04:45:00  15.462455
    2020-12-01 05:00:00  15.568036
    

    然后,与原始 df 连接

    frames = [df, df1]
    df2 = pd.concat(frames)
    df2.sort_values('Dates')
    

    返回

                            Values
    Dates                         
    2020-12-01 00:00:00  12.000000
    2020-12-01 00:06:00  12.000000
    2020-12-01 00:15:00  11.653527
    2020-12-01 00:30:00  11.960000
    2020-12-01 00:45:00  12.255313
    2020-12-01 01:00:00  12.539464
    2020-12-01 01:06:00  12.000000
    2020-12-01 01:15:00  12.812455
    2020-12-01 01:30:00  13.074286
    2020-12-01 01:45:00  13.324955
    2020-12-01 02:00:00  13.564464
    2020-12-01 02:06:00  13.000000
    2020-12-01 02:15:00  13.792813
    2020-12-01 02:30:00  14.010000
    2020-12-01 02:45:00  14.216027
    2020-12-01 03:00:00  14.410893
    2020-12-01 03:06:00  15.000000
    2020-12-01 03:15:00  14.594598
    2020-12-01 03:30:00  14.767143
    2020-12-01 03:45:00  14.928527
    2020-12-01 04:00:00  15.078750
    2020-12-01 04:06:00  16.000000
    2020-12-01 04:15:00  15.217812
    2020-12-01 04:30:00  15.345714
    2020-12-01 04:45:00  15.462455
    2020-12-01 05:00:00  15.568036
    2020-12-01 05:06:00  15.000000
    

    【讨论】:

    • 不错!谢谢。
    猜你喜欢
    • 1970-01-01
    • 2021-01-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多