【问题标题】:Interpolate between two nearby rows of Dataframe在相邻两行 Dataframe 之间进行插值
【发布时间】:2020-08-06 08:46:19
【问题描述】:

我想使用前后行值在数据框中的组内插入缺失值。

这是 df(组内有更多记录,但在本例中,我为每组留下 3 个):

import numpy as np
import pandas as pd
df = pd.DataFrame({'Group': ['a','a','a','b','b','b','c','c','c'],'Yval': [1,np.nan,5,2,np.nan,8,5,np.nan,10],'Xval': [0,3,2,4,5,8,3,1,9],'PTC': [0,1,0,0,1,0,0,1,0]})

df:

    Group   Yval    Xval    PTC
0   a       1.0     0       0
1   a       NaN     3       1
2   a       5.0     2       0
3   b       2.0     4       0
4   b       NaN     5       1
5   b       8.0     8       0
6   c       5.0     3       0
7   c       NaN     1       1
8   c       10.0    9       0

对于 PTC(计算点),我需要使用来自 -1、+1 行的 Xval、Yval 进行 Yval 插值。 IE。对于 A 组,我想: df.iloc[1,1]=np.interp(3, [0,2], [1,5])

这是我尝试使用 loc 和 shift 方法做的事情 并在此post 中找到 interp 函数:

df.loc[(df['PTC'] == 1), ['Yval']]= \
np.interp(df['Xval'], (df['Xval'].shift(+1),df['Xval'].shift(-1)),(df['Yval'].shift(+1),df['Yval'].shift(-1)))

我得到的错误:

ValueError: object too deep for desired array

【问题讨论】:

  • 所以基本上每个组总是有 3 个数据点?
  • 不,每个组有不同数量的数据点,但我只对相邻的两个感兴趣

标签: python dataframe interpolation linear-interpolation


【解决方案1】:
df['Xval-1'] = df['Xval'].shift(-1)
df['Xval+1'] = df['Xval'].shift(+1)
df['Yval-1'] = df['Yval'].shift(-1)
df['Yval+1'] = df['Yval'].shift(+1)

df["PTC_interpol"] = df.apply(lambda x: np.interp(x['Xval'], [x['Xval-1'], x['Xval+1']], [x['Yval-1'], x['Yval+1']]), axis=1)

df['PTC'] = np.where(df['PTC'].isnull(), df["PTC_interpol"], df['PTC'])

【讨论】: