根据条件在 Pandas Dataframe 中插入行答案

【问题标题】：Insert row in Pandas Dataframe based on a condition根据条件在 Pandas Dataframe 中插入行
【发布时间】：2017-06-15 17:25:20
【问题描述】：

我正在使用 Pandas 处理庞大的时间序列数据集。如果两个连续索引之间的差异大于 5，我想在数据框中的行之间添加行。

实际：

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544661  9     NaN

预期：

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544657  9     0
1497544661  9     NaN

我在索引上使用了 diff() 来获取两个连续索引之间的差异，但如果差异大于 5，我不确定如何插入记录。

import pandas as pd

df = pd.DataFrame([{"Date": 1497544649,"a":1, "result": 1}, 
                   {"Date": 1497544652,"a": 9, "result": 1},
                   {"Date": 1497544661,"a": 9, "result": 1}])
df.set_index("Date", inplace=True)

df.index.to_series().diff().fillna(0).to_frame("diff")

任何关于如何实现这一点的指针将不胜感激

谢谢

【问题讨论】：

标签： python python-2.7 pandas dataframe

【解决方案1】：

你有一个良好的开端。添加 diff 列以便于过滤。

获取与您的规则匹配的数据帧的索引并插入您的行。

df['diff'] = df.index.to_series().diff().fillna(0).to_frame("diff")

matches = df[df['diff'] > 5].index.tolist()


for i in matches:
    diff = df.loc[i]['diff']
    interval = round(diff/2) # index some place in the middle
    df.loc[i-interval] = [0, 0, 0, diff-interval] # insert row before matched index
    df.loc[i]['diff'] = interval # may not need to update the interval

df.sort_index(inplace=False) # pandas appends by default so we should sort this

del df.diff # we can remove this

【讨论】：