【问题标题】:Pandas - create new column with the sum of last N values of another column with groupbyPandas - 使用 groupby 的另一列的最后 N 个值的总和创建新列
【发布时间】:2021-02-25 00:54:21
【问题描述】:

我有这个 df:

       round_id          team       opponent  home_dummy   GC   GP  P
0          1.0       Flamengo    Atlético-MG         1.0  1.0  0.0  0
1          4.0       Flamengo         Grêmio         1.0  1.0  1.0  1
2          5.0       Flamengo       Botafogo         1.0  1.0  1.0  1
3          6.0       Flamengo         Santos         0.0  0.0  1.0  3
4          7.0       Flamengo          Bahia         0.0  3.0  5.0  3
5          8.0       Flamengo      Fortaleza         1.0  1.0  2.0  3
6          9.0       Flamengo     Fluminense         0.0  1.0  2.0  3
7         10.0       Flamengo          Ceará         0.0  2.0  0.0  0
8          3.0       Flamengo       Coritiba         0.0  0.0  1.0  3
9         11.0       Flamengo          Goiás         1.0  1.0  2.0  3
10        13.0       Flamengo   Athlético-PR         1.0  1.0  3.0  3
11        14.0       Flamengo          Sport         1.0  0.0  3.0  3
12        15.0       Flamengo          Vasco         0.0  1.0  2.0  3
13        16.0       Flamengo     Bragantino         1.0  1.0  1.0  1
14        17.0       Flamengo    Corinthians         0.0  1.0  5.0  3
15        18.0       Flamengo  Internacional         0.0  2.0  2.0  1
16        19.0       Flamengo      São Paulo         1.0  4.0  1.0  0
17        12.0       Flamengo      Palmeiras         0.0  1.0  1.0  1
18         2.0       Flamengo    Atlético-GO         0.0  3.0  0.0  0
19        20.0       Flamengo    Atlético-MG         0.0  4.0  0.0  0

我添加了一个新列,它创建了一个新列,其中包含另一个的最后 N 个值的总和,如下所示:

df['Last_5'] = df.P.rolling(5,min_periods=1).sum().shift().fillna(0)

这给了我:

      round_id           team       opponent  home_dummy   GC   GP  P  last_5
0          1.0       Flamengo    Atlético-MG         1.0  1.0  0.0  0  0
1          4.0       Flamengo         Grêmio         1.0  1.0  1.0  1  0
2          5.0       Flamengo       Botafogo         1.0  1.0  1.0  1  1
3          6.0       Flamengo         Santos         0.0  0.0  1.0  3  2
4          7.0       Flamengo          Bahia         0.0  3.0  5.0  3  5
5          8.0       Flamengo      Fortaleza         1.0  1.0  2.0  3  8
6          9.0       Flamengo     Fluminense         0.0  1.0  2.0  3 11
7         10.0       Flamengo          Ceará         0.0  2.0  0.0  0 13
8          3.0       Flamengo       Coritiba         0.0  0.0  1.0  3 12
9         11.0       Flamengo          Goiás         1.0  1.0  2.0  3 12
10        13.0       Flamengo   Athlético-PR         1.0  1.0  3.0  3 12
11        14.0       Flamengo          Sport         1.0  0.0  3.0  3 12
12        15.0       Flamengo          Vasco         0.0  1.0  2.0  3 12
13        16.0       Flamengo     Bragantino         1.0  1.0  1.0  1 15
14        17.0       Flamengo    Corinthians         0.0  1.0  5.0  3 13
15        18.0       Flamengo  Internacional         0.0  2.0  2.0  1 11
16        19.0       Flamengo      São Paulo         1.0  4.0  1.0  0  8
17        12.0       Flamengo      Palmeiras         0.0  1.0  1.0  1  8
18         2.0       Flamengo    Atlético-GO         0.0  3.0  0.0  0  6
19        20.0       Flamengo    Atlético-MG         0.0  4.0  0.0  0  5

但是假设我在同一个数据框中有很多团队:

       round_id        team      opponent  home_dummy   GC   GP /
0          1.0     Flamengo   Atlético-MG         1.0  1.0  0.0
1          4.0     Flamengo        Grêmio         1.0  1.0  1.0
2          5.0     Flamengo      Botafogo         1.0  1.0  1.0
3          6.0     Flamengo        Santos         0.0  0.0  1.0
4          7.0     Flamengo         Bahia         0.0  3.0  5.0
..         ...          ...           ...         ...  ...  ...
395       15.0  Atlético-GO    Bragantino         1.0  1.0  2.0
396       16.0  Atlético-GO        Santos         0.0  0.0  1.0
397       17.0  Atlético-GO  Athlético-PR         1.0  1.0  1.0
398        9.0  Atlético-GO         Vasco         0.0  1.0  2.0
399       20.0  Atlético-GO   Corinthians         1.0  1.0  1.0

如何对每个团队应用相同的计算并获得相同的结果,而不会在团队之间重叠最后 N 行?

【问题讨论】:

标签: pandas


【解决方案1】:

添加groupby

df['Last_5'] = df.groupby('team').P.apply(lambda x : x.rolling(5,min_periods=1).sum().shift().fillna(0))

【讨论】:

    猜你喜欢
    • 2023-03-11
    • 2022-10-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-01-18
    • 2023-03-25
    相关资源
    最近更新 更多