【问题标题】:Conditional aggregation on dataframe columns with combining 'n' rows into 1 row将“n”行合并为 1 行的数据框列的条件聚合
【发布时间】:2021-01-18 10:05:28
【问题描述】:

我有一个输入数据框,如下:

NAME    TEXT                                            START   END
Tim     Tim Wagner is a teacher.                        10      20.5
Tim     He is from Cleveland, Ohio.                     20.5    40
Frank   Frank is a musician.                            40      50
Tim     He like to travel with his family               50      62
Frank   He is a performing artist who plays the cello.  62      70
Frank   He performed at the Carnegie Hall last year.    70      85
Frank   It was fantastic listening to him.              85      90
Frank   I really enjoyed                                90      93

希望输出数据帧如下:

NAME    TEXT                                                                                       START       END
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.                                         10          40  
Frank   Frank is a musician                                                                           40          50
Tim     He like to travel with his family                                                             50          62
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year.   62          85
Frank   It was fantastic listening to him. I really enjoyed                                           85          93   

我当前的代码:

grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
  .agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
  .reset_index().drop('group', axis=1)

这会将最后 4 行合并为一。相反,即使“NAME”具有相同的值,我也只想合并 2 行(比如任何 n 行)。

感谢您对此的帮助。

谢谢

【问题讨论】:

    标签: python pandas dataframe aggregation


    【解决方案1】:

    您可以通过grp分组来获取组内的相关块:

    blocks = df.NAME.ne(df.NAME.shift()).cumsum()
    
    (df.groupby([blocks, df.groupby(blocks).cumcount()//2])
       .agg({'NAME':'first', 'TEXT':' '.join,
             'START':'min', 'END':'max'})
    )
    

    输出:

             NAME                                               TEXT  START   END
    NAME                                                                         
    1    0    Tim  Tim Wagner is a teacher. He is from Cleveland,...   10.0  40.0
    2    0  Frank                               Frank is a musician.   40.0  50.0
    3    0    Tim                  He like to travel with his family   50.0  62.0
    4    0  Frank  He is a performing artist who plays the cello....   62.0  85.0
         1  Frank  It was fantastic listening to him. I really en...   85.0  93.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-02-08
      • 1970-01-01
      • 1970-01-01
      • 2019-07-02
      • 2021-08-07
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多