数据框单列的规范化答案

【问题标题】：Normalization Of single Column Of Dataframe数据框单列的规范化
【发布时间】：2021-04-05 19:36:58
【问题描述】：

我有一个如图所示的数据框：

我必须将数据框的 Close 列标准化如下：

对于每个交易品种，我们必须将随后几天的收盘价除以第一个收盘价。这意味着 APLAPOLLO 在 2020 年 11 月 24 日的标准化收盘价将计算为：

Normalised_Close_price=（2020 年 11 月 24 日收盘）/（2020 年 11 月 23 日收盘）=0.9915（仅对 Aplapollo 有效）
现在，如果符号发生变化，方法保持不变，只有上述公式中的值会发生变化。因此，对于 AUBANK，2020 年 11 月 24 日的标准化收盘将计算为：

Normalised_Close_price=（2020 年 11 月 24 日收盘）/（2020 年 11 月 23 日收盘）=0.9915（仅对 AUBANK 有效）
其他交易品种的标准化收盘价应以相同方式计算。

因此，计算后数据框应如下所示：

例如：对于上述解释中的 Aplapollo：

24/11/2020=(3219.95/3247.45)=0.991532 的标准化值
2020 年 11 月 25 日的标准化值=（2020 年 11 月 25 日收盘）/（2020 年 11 月 23 日收盘）=0.991686
2020 年 11 月 26 日的标准化值=（2020 年 11 月 26 日收盘）/（2020 年 11 月 23 日收盘）=0.978907

其他符号也是如此

我在规范化数据框列时参考了以下答案：

Normalize columns of pandas data frame

这无济于事，因为值会根据我的情况而变化

【问题讨论】：

您的解释和您的标准化数据框略有不同。 Normalised_Close_price=(Close on 24/11/2020)/(Close on 23/11/2020)=0.9915( Valid Only For Aplapollo) 2020 年 11 月 24 日 APLAPOLLO 的标准化值在 datafrmae 中为 1.000155。我错过了什么吗？
你能举出单组前三个计算的例子吗？
@mmrbulbul 不，你没有错过我错误发布的任何内容
@pygirl 添加了示例
如果你想用 1 填充，那么使用.fillna(1) --> df1.groupby('Symbol')['Close'].transform(lambda x: x/x.shift(1)).fillna(1)

标签： python pandas dataframe

【解决方案1】：

试试：

df1['Normalize'] = df1.groupby('Symbol')['Close'].transform(lambda x: x/x.iloc[0]).fillna(1)#.reset_index()

正如舒巴姆所说：

您可以按组的第一个值除以

df['Close'] /= df1.groupby('Symbol')['Close'].transform('first')

df1:

    Date        Symbol      Close   Normalize
0   2020-11-23  APLAPOLLO   3247.45 1.000000
1   2020-11-24  APLAPOLLO   3219.95 0.991532
2   2020-11-25  APLAPOLLO   3220.45 0.991686
3   2020-11-26  APLAPOLLO   3178.95 0.978907
4   2020-11-27  APLAPOLLO   3378.90 1.040478
5   2020-12-01  APLAPOLLO   3446.85 1.061402
6   2020-12-02  APLAPOLLO   3514.55 1.082249
7   2020-12-03  APLAPOLLO   3545.80 1.091872
8   2020-12-04  APLAPOLLO   3708.60 1.142004
9   2020-12-07  APLAPOLLO   3868.55 1.191258
10  2020-12-08  APLAPOLLO   3750.30 1.154845
11  2020-12-09  APLAPOLLO   3801.35 1.170565
12  2020-12-10  APLAPOLLO   3766.65 1.159879
13  2020-12-11  APLAPOLLO   3674.30 1.131442
14  2020-12-14  APLAPOLLO   3814.80 1.174706
15  2020-12-15  APLAPOLLO   780.55  0.240358
16  2020-12-16  APLAPOLLO   790.20  0.243329
17  2020-12-17  APLAPOLLO   791.20  0.243637
18  2020-12-18  APLAPOLLO   769.70  0.237017
19  2020-12-21  APLAPOLLO   726.60  0.223745
20  2020-12-22  APLAPOLLO   744.30  0.229195
21  2020-11-23  AUBANK      869.65  1.000000
22  2020-11-24  AUBANK      874.35  1.005404
23  2020-11-25  AUBANK      856.25  0.984592
24  2020-11-26  AUBANK      861.05  0.990111
25  2020-11-27  AUBANK      839.05  0.964813
26  2020-12-01  AUBANK      872.90  1.003737
27  2020-12-02  AUBANK      886.65  1.019548
28  2020-12-03  AUBANK      880.30  1.012246
29  2020-12-04  AUBANK      880.45  1.012419
30  2020-12-07  AUBANK      898.65  1.033347
31  2020-12-08  AUBANK      907.80  1.043868
32  2020-12-09  AUBANK      918.90  1.056632
33  2020-12-10  AUBANK      911.05  1.047605
34  2020-12-11  AUBANK      920.30  1.058242
35  2020-12-14  AUBANK      929.45  1.068763
36  2020-12-15  AUBANK      922.60  1.060887
37  2020-12-16  AUBANK      915.80  1.053067
38  2020-12-17  AUBANK      943.15  1.084517
39  2020-12-18  AUBANK      897.00  1.031449
40  2020-12-21  AUBANK      840.45  0.966423
41  2020-12-22  AUBANK      856.00  0.984304
42  2020-11-23  AARTIDRUGS  711.70  1.000000

【讨论】：

【解决方案2】：

我想你要找的是这个。

   df["Normalise"] =  df.groupby('Symbol', sort=False)['Close'].rolling(2).apply(lambda x: x.iloc[1] / x.iloc[0]).reset_index(0, drop=True)

【讨论】：

避免使用 apply 因为它很慢。