【问题标题】:pandas groupby and subtract last value of one columns with first value of another columnpandas groupby 并用另一列的第一个值减去一列的最后一个值
【发布时间】:2023-03-11 10:21:01
【问题描述】:

我正在尝试添加一个新列,其中包含一列的第一个值与另一列的最后一个值之间的差异 我正在使用这个命令

df['diff']=df.groupby(['T_Id'])['EndMeterReading'].max()-df['StartMeterReading'].min()

但它用NaN填充新列

我怎样才能达到我想要的结果。

原始数据帧

+------+-------+--------------+------------+
| D_Id | T_Id  | StartReading | EndReading |
+------+-------+--------------+------------+
|    1 | 4716a |      4323.17 |     4324.8 |
|    1 | 4716a |      4324.96 |    4325.34 |
|    1 | 4716a |      4326.47 |    4327.22 |
|    1 | 4716a |       4327.4 |    4328.43 |
|    1 | 4716a |      4328.85 |    4330.73 |
|    1 | 4716b |      4346.65 |    4347.62 |
|    1 | 4716b |      4347.67 |    4349.88 |
|    1 | 4716b |      4351.62 |    4351.83 |
|    1 | 4716b |      4352.88 |    4354.32 |
|    1 | 4716b |      4354.93 |    4355.14 |
|    1 | 4716b |       4355.2 |    4355.82 |
|    1 | 4716b |      4356.91 |    4357.37 |
|    1 | 4716b |      4357.74 |    4358.26 |
|    1 | 4716b |      4359.89 |    4360.46 |
|    1 | 4716b |      4360.61 |    4361.43 |
|    1 | 4716b |      4361.47 |    4362.11 |
|    1 | 4716b |      4362.88 |    4368.49 |
|    1 | 4716b |      4368.94 |    4369.78 |
|    1 | 4716b |      4370.91 |    4371.25 |
|    1 | 4716b |      4372.67 |    4372.77 |
+------+-------+--------------+------------+

期望的输出:

+------+-------+--------------+------------+------------------+
| D_Id | T_Id  | StartReading | EndReading |       Diff       |
+------+-------+--------------+------------+------------------+
|    1 | 4716a |      4323.17 |     4324.8 |             7.56 |
|    1 | 4716a |      4324.96 |    4325.34 |             7.56 |
|    1 | 4716a |      4326.47 |    4327.22 |             7.56 |
|    1 | 4716a |       4327.4 |    4328.43 |             7.56 |
|    1 | 4716a |      4328.85 |    4330.73 |             7.56 |
|    1 | 4716b |      4346.65 |    4347.62 |            26.12 |
|    1 | 4716b |      4347.67 |    4349.88 |            26.12 |
|    1 | 4716b |      4351.62 |    4351.83 |            26.12 |
|    1 | 4716b |      4352.88 |    4354.32 |            26.12 |
|    1 | 4716b |      4354.93 |    4355.14 |            26.12 |
|    1 | 4716b |       4355.2 |    4355.82 |            26.12 |
|    1 | 4716b |      4356.91 |    4357.37 |            26.12 |
|    1 | 4716b |      4357.74 |    4358.26 |            26.12 |
|    1 | 4716b |      4359.89 |    4360.46 |            26.12 |
|    1 | 4716b |      4360.61 |    4361.43 |            26.12 |
|    1 | 4716b |      4361.47 |    4362.11 |            26.12 |
|    1 | 4716b |      4362.88 |    4368.49 |            26.12 |
|    1 | 4716b |      4368.94 |    4369.78 |            26.12 |
|    1 | 4716b |      4370.91 |    4371.25 |            26.12 |
|    1 | 4716b |      4372.67 |    4372.77 |            26.12 |
+------+-------+--------------+------------+------------------+

【问题讨论】:

    标签: python pandas dataframe group-by


    【解决方案1】:

    使用groupby找到firstlast,然后merge返回原始df

    df2 = df.groupby(['T_Id']).agg({'StartReading' : 'first', 'EndReading' : 'last'}).reset_index(0)
    df2['Diff'] = df2['EndReading'] - df2['StartReading']
    df.merge(df2[['T_Id', 'Diff']], how='left', on='T_Id')
    

    【讨论】:

      【解决方案2】:

      GroupBy.transformmaxmin 函数一起使用,Series 的大小与原始DataFrame 相同,因此可以正确减去:

      df['diff']= (df.groupby('T_Id')['EndReading'].transform('max')-
                   df.groupby('T_Id')['StartReading'].transform('min'))
      
      print (df)
          D_Id   T_Id  StartReading  EndReading   diff
      0      1  4716a       4323.17     4324.80   7.56
      1      1  4716a       4324.96     4325.34   7.56
      2      1  4716a       4326.47     4327.22   7.56
      3      1  4716a       4327.40     4328.43   7.56
      4      1  4716a       4328.85     4330.73   7.56
      5      1  4716b       4346.65     4347.62  26.12
      6      1  4716b       4347.67     4349.88  26.12
      7      1  4716b       4351.62     4351.83  26.12
      8      1  4716b       4352.88     4354.32  26.12
      9      1  4716b       4354.93     4355.14  26.12
      10     1  4716b       4355.20     4355.82  26.12
      11     1  4716b       4356.91     4357.37  26.12
      12     1  4716b       4357.74     4358.26  26.12
      13     1  4716b       4359.89     4360.46  26.12
      14     1  4716b       4360.61     4361.43  26.12
      15     1  4716b       4361.47     4362.11  26.12
      16     1  4716b       4362.88     4368.49  26.12
      17     1  4716b       4368.94     4369.78  26.12
      18     1  4716b       4370.91     4371.25  26.12
      19     1  4716b       4372.67     4372.77  26.12
      

      【讨论】:

      • 你能告诉我如何在上面的df中添加一个新列,除了最后一次出现/行的组,如0或1
      • @M_S_N - 1 填充的新列的索引和0 填充的索引是什么?
      • 喜欢用 0,0,0,0 填充第一次出现的 4716a,但最后一次出现在新列中有 1
      • @M_S_N 所以索引 051 并且所有其他值都是 0 在新列上?
      • @M_S_N - 你能检查一下df['new'] = (~df['T_Id'].duplicated(keep='last')).astype(int) 吗?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-11-15
      • 1970-01-01
      • 2023-03-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多