【问题标题】:Pandas, repeat rows with shift熊猫,用 shift 重复行
【发布时间】:2021-05-25 17:07:33
【问题描述】:

我正在尝试重复数据帧的行,其中每个重复都包含原始行的移位版本。为了澄清,在下面的例子中,对于每个重复的行,压力和速度需要是移位的值:

random.seed(42)
N = 4
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0 + datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(N)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(N)]
speeds.sort()
pressures = [i**2 + random.normalvariate(0, 30) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)

哪个输出:

                                speed   pressure
2020-01-01 00:00:03.001291  1.782449  -0.964646
2020-01-01 00:00:26.785289  7.090295  20.997679
2020-01-01 00:00:33.003518  7.628241  75.141566
2020-01-01 00:01:16.731216  9.029616  46.504617

现在我需要获取以下数据帧,该数据帧通过移位重复“速度”和“压力”列的值(此处移位等于 3):

                         index     speed   pressure
0  2020-01-01 00:00:03.001291  1.782449  -0.964646
1  2020-01-01 00:00:03.001291       NaN        NaN
2  2020-01-01 00:00:03.001291       NaN        NaN
3  2020-01-01 00:00:26.785289  7.090295  20.997679
4  2020-01-01 00:00:26.785289  1.782449  -0.964646
5  2020-01-01 00:00:26.785289       NaN        NaN
6  2020-01-01 00:00:33.003518  7.628241  75.141566
7  2020-01-01 00:00:33.003518  7.090295  20.997679
8  2020-01-01 00:00:33.003518  1.782449  -0.964646
9  2020-01-01 00:01:16.731216  9.029616  46.504617
10 2020-01-01 00:01:16.731216  7.628241  75.141566
11 2020-01-01 00:01:16.731216  7.090295  20.997679

这可以通过循环获得,但对于大型数据帧来说太慢了:

n = 3
df_rpt = df.reindex(np.repeat(df.index.values, n), method='bfill')
df_rpt.reset_index(inplace=True)
Col = ['speed', 'pressure']
for c in Col:
    for i in range(0, n + 1):
        df.loc[:, c + '_shift' + str(i)] = df.loc[:, c].shift(i)
df_s = df_rpt.copy()
for i in df_rpt.index:
    for c in Col:
        df_s.loc[i, c] = df.loc[df_s.iloc[i]['index'], c + '_shift' + str(i % n)]

有更快的方法吗?

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    试试:

    N = 3
    
    df.speed = list(map(list, zip(*[df.speed.shift(x) for x in range(N)])))
    df.pressure = list(map(list, zip(*[df.pressure.shift(x) for x in range(N)])))
    
    df_out = df.explode("speed")
    df_out.pressure = df.pressure.explode()
    print(df_out)
    

    打印:

                                   speed   pressure
    2020-01-01 00:00:03.001291  1.782449  -0.964646
    2020-01-01 00:00:03.001291       NaN        NaN
    2020-01-01 00:00:03.001291       NaN        NaN
    2020-01-01 00:00:26.785289  7.090295  20.997679
    2020-01-01 00:00:26.785289  1.782449  -0.964646
    2020-01-01 00:00:26.785289       NaN        NaN
    2020-01-01 00:00:33.003518  7.628241  75.141566
    2020-01-01 00:00:33.003518  7.090295  20.997679
    2020-01-01 00:00:33.003518  1.782449  -0.964646
    2020-01-01 00:01:16.731216  9.029616  46.504617
    2020-01-01 00:01:16.731216  7.628241  75.141566
    2020-01-01 00:01:16.731216  7.090295  20.997679
    

    【讨论】:

    • 非常好。 +1 @AndrejKesely
    • 非常高效。
    猜你喜欢
    • 1970-01-01
    • 2019-07-26
    • 2020-09-26
    • 1970-01-01
    • 1970-01-01
    • 2021-09-05
    • 2020-03-06
    • 2014-05-25
    相关资源
    最近更新 更多