我重新定义了一个类似的数据框,因为使用随机变量生成器会使调试变得困难。在每次测试中,数据帧随机变化。
为了简单起见,我将使用 2 和 3 个移动平均线周期。
起始数据框
Date Store Sales
9 2022-10-10 1 5347
8 2022-10-09 1 1561
7 2022-10-08 1 5648
6 2022-10-07 1 8123
5 2022-10-06 1 1401
4 2022-10-05 0 2745
3 2022-10-04 0 7848
2 2022-10-03 0 3151
1 2022-10-02 0 4296
0 2022-10-01 0 9028
它给 :
ex = pd.DataFrame({
"Date": pd.date_range('2022-10-01', '2022-10-10'),
"Store": [0]*5+[1]*5,
"Sales": [9028, 4296, 3151, 7848, 2745, 1401, 8123, 5648, 1561, 5347],
})
ex.sort_values(['Store','Date'], ascending=False, inplace=True)
建议代码
import pandas as pd
import numpy as np
ex = pd.DataFrame({
"Date": pd.date_range('2022-10-01', '2022-10-10'),
"Store": [0]*5+[1]*5,
"Sales": [9028, 4296, 3151, 7848, 2745, 1401, 8123, 5648, 1561, 5347],
})
ex.sort_values(['Store','Date'], ascending=False, inplace=True)
periods=(2,3)
### STEP 1 -- Initialization : exhaustive Mean() Calculation
for per in periods:
ex["Sales_mean_{0}_days".format(per)] = (
ex.groupby(['Store'])['Sales']
.apply(lambda g: g.shift(-1)
.rolling(per)
.mean()
.shift(-per+1))
)
### STEP 2 -- New Row Insertion
def fmt_newRow(g, newRow, periods):
return {
"Date": pd.Timestamp(newRow[0]),
"Store": newRow[1],
"Sales": newRow[2],
"Sales_mean_{0}_days".format(periods[0]): (g['Sales'].iloc[0:periods[0]].sum()) / periods[0],
"Sales_mean_{0}_days".format(periods[1]): (g['Sales'].iloc[0:periods[1]].sum()) / periods[1],
}
def add2DF(ex, newRow):
# g : sub-Store group
g = (
ex.loc[ex.Store==newRow[1]]
.sort_values(['Store','Date'], ascending=False)
)
# Append newRow like a dictionnary and sort by ['Store','Date']
ex = (
ex.append(fmt_newRow(g, newRow, periods), ignore_index=True)
.sort_values(['Store','Date'], ascending=False)
.reset_index(drop=True)
)
#
return ex
newRow = ['2022-10-11', 1, 2803] # [Date, Store, Sales]
ex = add2DF(ex, newRow)
print(ex)
结果
Date Store Sales Sales_mean_2_days Sales_mean_3_days
0 2022-10-11 1 2803 3454.0 4185.333333
1 2022-10-10 1 5347 3604.5 5110.666667
2 2022-10-09 1 1561 6885.5 5057.333333
3 2022-10-08 1 5648 4762.0 NaN
4 2022-10-07 1 8123 NaN NaN
5 2022-10-06 1 1401 NaN NaN
6 2022-10-05 0 2745 5499.5 5098.333333
7 2022-10-04 0 7848 3723.5 5491.666667
8 2022-10-03 0 3151 6662.0 NaN
9 2022-10-02 0 4296 NaN NaN
10 2022-10-01 0 9028 NaN NaN
注释
- 新行是这样一个列表:
[Date, Store, Sales]
- 每次需要将新行保存到数据框时,将其传递给
fmt_newRow函数和相应的子组g
-
fmt_newRow 以字典的形式返回一个新行,该行与append Pandas 函数集成在数据框中
- 无需重新计算所有平均值,因为只有
per-last g 值用于计算新的行平均值
- 第 2 期和第 3 期的移动平均线已经过检查并且是正确的。