如何使用基于python列中先前值的函数创建列答案

【问题标题】：How to create a column using a function based of previous values in the column in python如何使用基于python列中先前值的函数创建列
【发布时间】：2020-01-12 04:51:53
【问题描述】：

我的问题

我有一个循环，它根据时间段 t-1 中的 x 在时间段 t 中为 x 创建一个值。循环真的很慢，所以我想尝试把它变成一个函数。我尝试将 np.where 与 shift() 一起使用，但我并不高兴。知道我如何能够解决这个问题吗？

谢谢！

我的代码

import numpy as np
import pandas as pd

csv1 = pd.read_csv('y_list.csv', delimiter = ',')
df = pd.DataFrame(csv1)

df.loc[df.index[0], 'var'] = 0

for x in range(1,len(df.index)):
    if df["LAST"].iloc[x] > 0:
        df["var"].iloc[x] = ((df["var"].iloc[x - 1] * 2) + df["LAST"].iloc[x]) / 3
    else:
        df["var"].iloc[x] = (df["var"].iloc[x - 1] * 2) / 3 

df

输入数据

Dates,LAST
03/09/2018,-7
04/09/2018,5
05/09/2018,-4
06/09/2018,5
07/09/2018,-6
10/09/2018,6
11/09/2018,-7
12/09/2018,7
13/09/2018,-9

输出

Dates,LAST,var
03/09/2018,-7,0.000000
04/09/2018,5,1.666667
05/09/2018,-4,1.111111
06/09/2018,5,2.407407
07/09/2018,-6,1.604938
10/09/2018,6,3.069959
11/09/2018,-7,2.046639
12/09/2018,7,3.697759
13/09/2018,-9,2.465173

【问题讨论】：

标签： python pandas loops numpy dataframe

【解决方案1】：

你在看ewm:

arg = df.LAST.clip(lower=0)
arg.iloc[0] = 0
arg.ewm(alpha=1/3, adjust=False).mean()

输出：

0    0.000000
1    1.666667
2    1.111111
3    2.407407
4    1.604938
5    3.069959
6    2.046639
7    3.697759
8    2.465173
Name: LAST, dtype: float64

【讨论】：

非常感谢。不过，它似乎没有为我创建一个新专栏？还有它背后的逻辑是什么？
df["var"] = arg.ewm(alpha=1/3, adjust=False).mean()?
谢谢你的作品。如果我想将 2/3 比率更改为 13/14，我将如何在您的代码中执行此操作？
alpha = 1/3 = 1-2/3，所以alpha = 1/14?
我在下面尝试过（您回答的另一个线程中的代码），但它给出了 v 不同的数字。代码是...对于 x in range(1,len(df.index)): if df["delta"].iloc[x] > 0: df.iloc[x, -1] = ((df[ "avg_gain"].iloc[x - 1] * 13) + df["delta"].iloc[x]) / 14 否则：df.iloc[x,-1] = ((df["avg_gain"]. iloc[x - 1].copy() * 13) + 0) / 14

【解决方案2】：

您可以使用df.shift 将数据框移动为默认的 1 行，并将 if-else 块转换为矢量化np.where：

In [36]: df
Out[36]: 
        Dates  LAST  var
0  03/09/2018    -7  0.0
1  04/09/2018     5  1.7
2  05/09/2018    -4  1.1
3  06/09/2018     5  2.4
4  07/09/2018    -6  1.6
5  10/09/2018     6  3.1
6  11/09/2018    -7  2.0
7  12/09/2018     7  3.7
8  13/09/2018    -9  2.5

In [37]: (df.shift(1)['var']*2 + np.where(df['LAST']>0, df['LAST'], 0)) / 3
Out[37]: 
0         NaN
1    1.666667
2    1.133333
3    2.400000
4    1.600000
5    3.066667
6    2.066667
7    3.666667
8    2.466667
Name: var, dtype: float64

【讨论】：

感谢您的回答。当我输入代码时，它只会给我 NaN。我错过了什么吗？
@pythonlearner13，见编辑。我怀疑有数据问题？假设您发布的是输入？
对不起，这就是原因。 Dates 和 LAST 列是输入，var 列是预期输出。有没有不使用 var 列来计算 var 的方法？
我已经修改了原帖，使其更清晰