【问题标题】:Apply Coalesce after grouping on two columns in pandas在熊猫的两列上分组后应用合并
【发布时间】:2021-02-22 17:18:52
【问题描述】:

请帮忙。 我有一个像

这样的数据框
|    |    ID | Result   |   measurement_1 |   measurement_2 |   measurement_3 |   measurement_4 |   measurement_5 | start_time                 | end-time                   |
|----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------|
|  0 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-20 21:24:03.390000 | 2020-10-20 23:46:36.990000 |
|  1 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-21 04:36:03.390000 | 2020-10-21 06:58:36.990000 |
|  2 | 12345 | nan      |           49584 |            2827 |             nan |             nan |             nan | 2020-10-21 09:24:03.390000 | 2020-10-21 11:46:36.990000 |
|  3 | 12345 | nan      |             nan |             nan |            3940 |             nan |             nan | 2020-10-21 14:12:03.390000 | 2020-10-21 16:34:36.990000 |
|  4 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-21 21:24:03.390000 | 2020-10-21 23:46:36.990000 |
|  5 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-22 02:40:51.390000 | 2020-10-22 05:03:24.990000 |
|  6 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-22 08:26:27.390000 | 2020-10-22 10:49:00.990000 |
|  7 | 12345 | Pass     |             nan |             nan |             nan |             392 |             304 | 2020-10-22 14:12:03.390000 | 2020-10-22 16:34:36.990000 |
|  8 | 12346 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-22 19:57:39.390000 | 2020-10-22 22:20:12.990000 |
|  9 | 12346 | nan      |           22839 |            4059 |             nan |             nan |             nan | 2020-10-23 01:43:15.390000 | 2020-10-23 04:05:48.990000 |
| 10 | 12346 | nan      |             nan |             nan |            4059 |             nan |             nan | 2020-10-23 07:28:51.390000 | 2020-10-23 09:51:24.990000 |
| 11 | 12346 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-23 13:14:27.390000 | 2020-10-23 15:37:00.990000 |
| 12 | 12346 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-23 19:00:03.390000 | 2020-10-23 21:22:36.990000 |
| 13 | 12346 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-24 00:45:39.390000 | 2020-10-24 03:08:12.990000 |
| 14 | 12346 | Fail     |             nan |             nan |             nan |            2938 |             495 | 2020-10-24 06:31:15.390000 | 2020-10-24 08:53:48.990000 |
| 15 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-24 12:16:51.390000 | 2020-10-24 14:39:24.990000 |
| 16 | 12345 | nan      |           62839 |            1827 |             nan |             nan |             nan | 2020-10-24 18:02:27.390000 | 2020-10-24 20:25:00.990000 |
| 17 | 12345 | nan      |             nan |             nan |            2726 |             nan |             nan | 2020-10-24 23:48:03.390000 | 2020-10-25 02:10:36.990000 |
| 18 | 12345 | nan      |             nan |             nan |             nan |             nan |             nan | 2020-10-25 05:33:39.390000 | 2020-10-25 07:56:12.990000 |
| 19 | 12345 | Fail     |             nan |             nan |             nan |             nan |            1827 | 2020-10-25 11:19:15.390000 | 2020-10-25 13:41:48.990000 |
+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+

and want my output to look like

+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+
|    |    ID | Result   |   measurement_1 |   measurement_2 |   measurement_3 | measurement_4   |   measurement_5 | start_time                 | end-time                   |
|----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------|
|  0 | 12345 | Pass     |           49584 |            2827 |            3940 | 392             |             304 | 2020-10-20 21:24:03.390000 | 2020-10-22 16:34:36.990000 |
|  1 | 12346 | Fail     |           22839 |            4059 |            4059 | 2938            |             495 | 2020-10-22 19:57:39.390000 | 2020-10-24 08:53:48.990000 |
|  2 | 12345 | Fail     |           62839 |            1827 |            2726 | nan            |            1827 | 2020-10-24 12:16:51.390000 | 2020-10-23 13:41:48.990000 |
+----+-------+----------+-----------------+-----------------+-----------------+-----------------+-----------------+----------------------------+----------------------------+
​
so far I am able to group the cols on `ID` and `Result`. Now want to apply the Coalesce to it (newDf)


    df = pd.read_excel("Test_Coalesce.xlsx")
    newDf  = df.groupby(['ID','Result'])
    newDf.all().reset_index()

【问题讨论】:

    标签: python pandas numpy dataframe nan


    【解决方案1】:

    您似乎想对ID 的连续块进行分组。如果是这样:

    blocks = df['ID'].ne(df['ID'].shift()).cumsum()
    
    agg_dict = {k:'first' if k != 'end-time' else 'last' 
                 for k in df.columns}
    df.groupby(blocks).agg(agg_dict)
    

    【讨论】:

    • 酷。我们可以重置索引吗?
    • 是的,您可以将as_index=False 传递给groupby,或者与reset_index(drop-True) 链接。
    • 太棒了!!!一个重要的编辑start time 应该是最小的时间,last time 应该是最大的。
    • @Hayat 在agg_dict 中由'first'/'last' 处理。喜欢的话可以改成'min'/'max'
    • 太棒了..你帮了大忙。
    猜你喜欢
    • 2018-12-24
    • 1970-01-01
    • 2018-11-07
    • 2022-09-27
    • 2014-10-20
    • 1970-01-01
    • 2019-12-02
    • 2017-09-19
    • 1970-01-01
    相关资源
    最近更新 更多