【问题标题】:Cumulative groupby with condition on datetime pandas具有日期时间熊猫条件的累积 groupby
【发布时间】:2021-11-01 13:32:54
【问题描述】:

我需要根据 playerId 列和 datetime 列计算 pandas 数据框中不同列的累积总和。我的数据框如下所示:

      eventId  playerId  goal  shot  header  dateutc
0     0        100       0     1     0       2020-11-08 17:00:00
1     1        100       0     0     1       2020-11-08 17:00:00
2     2        100       1     1     0       2020-11-08 17:00:00
3     3        200       0     1     0       2020-11-08 17:00:00      
4     4        100       1     0     1       2020-11-15 17:00:00
5     5        100       1     1     0       2020-11-15 17:00:00
6     6        200       1     1     0       2020-11-15 17:00:00      

所以现在我需要计算每个玩家当前日期和之前所有日期的累积总和。所以我的最终日期框架将如下所示:

      playerId  dateutc              goal  shot  header
0     100       2020-11-08 17:00:00  1     2     1
1     200       2020-11-08 17:00:00  0     1     0
2     100       2020-11-15 17:00:00  3     3     2
3     200       2020-11-15 17:00:00  1     2     0    

希望有人可以帮助我:)

【问题讨论】:

    标签: python pandas dataframe pandas-groupby


    【解决方案1】:

    试试:

    out = df.groupby(['playerId', 'dateutc'], sort=False)[['goal', 'shot', 'header']].sum()
    out = out.groupby(level='playerId').cumsum().reset_index()
    

    输出:

    >>> out
       playerId              dateutc  goal  shot  header
    0       100  2020-11-08 17:00:00     1     2       1
    1       200  2020-11-08 17:00:00     0     1       0
    2       100  2020-11-15 17:00:00     3     3       2
    3       200  2020-11-15 17:00:00     1     2       0
    

    【讨论】:

      【解决方案2】:

      首先删除eventId以避免sum如果是数字,聚合sum然后cumsum

      df1 = (df.drop('eventId',axis=1)
               .groupby(['playerId','dateutc'], sort=False)
               .sum()
               .groupby(level=0, sort=False)
               .cumsum()
               .reset_index())
      print (df1)
         playerId              dateutc  goal  shot  header
      0       100  2020-11-08 17:00:00     1     2       1
      1       200  2020-11-08 17:00:00     0     1       0
      2       100  2020-11-15 17:00:00     3     3       2
      3       200  2020-11-15 17:00:00     1     2       0
      

      如果需要指定列进行处理:

      df1 = (df.groupby(['playerId','dateutc'], sort=False)[['goal', 'shot', 'header']]
               .sum()
               .groupby(level=0, sort=False)
               .cumsum()
               .reset_index())
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-07-26
        • 2019-10-11
        • 1970-01-01
        • 1970-01-01
        • 2020-12-30
        • 2018-11-15
        • 2015-11-05
        相关资源
        最近更新 更多