【问题标题】:comma seperated values in columns as rows in pandas逗号分隔的列中的值作为熊猫中的行
【发布时间】:2022-01-18 07:13:24
【问题描述】:

如下所述,我在 pandas 中有一个数据框,其中 info 列中的元素与 id 列中的唯一文件相同:

id   text         info
1    great        boy,police
1    excellent    boy,police
2    nice         girl,mother,teacher
2    good         girl,mother,teacher
2    bad          girl,mother,teacher
3    awesome      grandmother
4    superb       grandson

我只想将列表元素作为每个文件的行,例如:

id   text         info
1    great        boy
1    excellent    police
2    nice         girl
2    good         mother
2    bad          teacher
3    awesome      grandmother
4    superb       grandson

【问题讨论】:

    标签: python-3.x pandas dataframe


    【解决方案1】:

    让我们试试

    df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
    df
       id       text                 info          new
    0   1      great           boy,police          boy
    1   1  excellent           boy,police       police
    2   2       nice  girl,mother,teacher         girl
    3   2       good  girl,mother,teacher       mother
    4   2        bad  girl,mother,teacher      teacher
    5   3    awesome          grandmother  grandmother
    6   4     superb             grandson     grandson
    

    【讨论】:

      【解决方案2】:

      利用'info' 重复这一事实。

      df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()
      

      输出:

         id       text         info
      0   1      great          boy
      1   1  excellent       police
      2   2       nice         girl
      3   2       good       mother
      4   2        bad      teacher
      5   3    awesome  grandmother
      6   4     superb     grandson
      

      【讨论】:

        【解决方案3】:

        一种使用pandas.DataFrame.groupby.transform的方式。

        请注意,这是假设:

        1. info 中的元素长度与每个id 在被',' 拆分后的成员数相同
        2. info 中的元素在同一个 id 中是相同的。

        df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
        print(df)
        

        输出:

           id       text         info
        0   1      great          boy
        1   1  excellent       police
        2   2       nice         girl
        3   2       good       mother
        4   2        bad      teacher
        5   3    awesome  grandmother
        6   4     superb     grandson
        

        【讨论】:

          【解决方案4】:

          创建临时变量,计算每个 info 组的行数:

          temp = df.groupby('info').cumcount()
          

          info 中的每个文本进行列表理解:

          df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]
          
          df
          
             id       text         info
          0   1      great          boy
          1   1  excellent       police
          2   2       nice         girl
          3   2       good       mother
          4   2        bad      teacher
          5   3    awesome  grandmother
          6   4     superb     grandson
          

          【讨论】:

            【解决方案5】:

            或者试试apply:

            df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)
            

            输出:

            >>> df
               id       text         info
            0   1      great          boy
            1   1  excellent       police
            2   2       nice         girl
            3   2       good       mother
            4   2        bad      teacher
            5   3    awesome  grandmother
            6   4     superb     grandson
            >>> 
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 2021-10-13
              • 1970-01-01
              • 2017-03-17
              • 1970-01-01
              • 2019-04-30
              • 1970-01-01
              • 2021-12-06
              • 1970-01-01
              相关资源
              最近更新 更多