【问题标题】:how to get sum of rows when a column follows same value?当一列遵循相同的值时,如何获得行的总和?
【发布时间】:2019-02-27 16:48:58
【问题描述】:

我有一个像这样的数据集:

 time(secs) setup
     40     setup1
     30     setup1       
     20     setup1
     10     setup2
     20     setup2 
     10     setup1
     30     setup1
     30     setup2
     40     setup2
     10     setup3
     20     setup3

我想获得 pandas dataframe 中的行总和,具体取决于类似的 setup 值:

  time(secs)  setup
    90        setup1
    30        setup2
    40        setup1
    70        setup2
    30        setup3

但是通过使用groupby()函数:

  df.groupby(['setup']).sum()

我得到的结果是:

  setup      time 

  setup1      130 
  setup2      100
  setup3       30

请帮助解决这个问题...

谢谢!!!

【问题讨论】:

    标签: python-2.7 pandas sum grouping pandas-groupby


    【解决方案1】:

    groupby 和聚合 sumfirst by helper Series by cumsum 比较 shiftSeries.ne (!=)

    df1 = (df.groupby(df['setup'].ne(df['setup'].shift()).cumsum(), as_index=False)
             .agg({'time(secs)':'sum', 'setup':'first'}))
    print (df1)
       time(secs)   setup
    0          90  setup1
    1          30  setup2
    2          40  setup1
    3          70  setup2
    4          30  setup3
    

    详情:

    print (df['setup'].ne(df['setup'].shift()).cumsum())
    0     1
    1     1
    2     1
    3     2
    4     2
    5     3
    6     3
    7     4
    8     4
    9     5
    10    5
    Name: setup, dtype: int32
    

    新列的类似解决方案:

    df['groups'] = df['setup'].ne(df['setup'].shift()).cumsum()
    print (df)
        time(secs)   setup  groups
    0           40  setup1       1
    1           30  setup1       1
    2           20  setup1       1
    3           10  setup2       2
    4           20  setup2       2
    5           10  setup1       3
    6           30  setup1       3
    7           30  setup2       4
    8           40  setup2       4
    9           10  setup3       5
    10          20  setup3       5
    
    df1 = (df.groupby('groups')
             .agg({'time(secs)':'sum', 'setup':'first'})
             .reset_index(drop=True))
    

    df1 = (df.groupby(['groups', 'setup'])['time(secs)'].sum()
             .reset_index(level=0, drop=True)
             .reset_index())
    
    print (df1)
       time(secs)   setup
    0          90  setup1
    1          30  setup2
    2          40  setup1
    3          70  setup2
    4          30  setup3
    

    【讨论】:

      猜你喜欢
      • 2019-09-17
      • 2019-12-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-11-30
      • 1970-01-01
      • 2022-12-05
      • 1970-01-01
      相关资源
      最近更新 更多