【问题标题】:Python : Pandas pivot table for multiple columns at once which has duplicate valuesPython:一次包含重复值的多个列的 Pandas 数据透视表
【发布时间】:2022-01-01 13:49:35
【问题描述】:

有一个包含名称、学校和标记列的 pandas 数据框

name  school  marks

tom     HBS     55
tom     HBS     55
tom     HBS     14
mark    HBS     28
mark    HBS     19
lewis   HBS     88

如何转置和转换成这样的

name  school  marks_1 marks_2 marks_3

tom     HBS     55     55       14
mark    HBS     28     19
lewis   HBS     88

试过这个:

df = df.pivot_table(index='name', values='marks', columns='school') \
    .reset_index() \
    .rename_axis(None, axis=1)

print(df)
df = df.pivot('name','marks','school')

检查了这些链接

https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape-by-two-variables
https://stackoverflow.com/questions/62391419/pandas-group-by-and-convert-rows-into-multiple-columns
https://stackoverflow.com/questions/60698109/pandas-multiple-rows-to-single-row-with-multiple-columns-on-2-indexes

由于重复值而出现此错误。如果存在重复如何处理,我们必须保留它们

ValueError: Index contains duplicate entries, cannot reshape

【问题讨论】:

    标签: python pandas dataframe group-by pivot


    【解决方案1】:

    尝试将set_indexunstackgroupbycumcount 一起使用:

    df_out = df.set_index(['name',
                           'school',
                           df.groupby(['name','school'])\
               .cumcount() +1]).unstack()
    df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
    df_out = df_out.reset_index()
    df_out
    

    输出:

        name school  marks_1  marks_2  marks_3
    0  lewis    HBS     88.0      NaN      NaN
    1   mark    HBS     28.0     19.0      NaN
    2    tom    HBS     55.0     55.0     14.0
    

    【讨论】:

      【解决方案2】:

      cumcount 函数允许在旋转之前创建唯一索引。这建立在与@ScottBoston 相同的想法之上;但是,这里使用了pivot 函数:

      index = ['name', 'school']
      
                        # create an extra column for uniqueness          
      temp = (df.assign(counter = df.groupby(index)
                                    .cumcount()
                                    .add(1)
                                    .astype(str))
                .pivot(index = index, columns = 'counter')
              )
      
      # flatten the columns
      temp.columns = temp.columns.map('_'.join)
      
      temp.reset_index()
      
          name school  marks_1  marks_2  marks_3
      0  lewis    HBS     88.0      NaN      NaN
      1   mark    HBS     28.0     19.0      NaN
      2    tom    HBS     55.0     55.0     14.0
      

      或者,您可以使用pyjanitor 中的pivot_wider 函数,它是pd.pivot 周围的语法糖,并带有一些帮助:

      # pip install pyjanitor
      import pandas as pd
      import janitor
      (df.assign(counter = df.groupby(index)
                             .cumcount()
                             .add(1))                              
         .pivot_wider(index = index, 
                      names_from = 'counter', 
                      names_sep = '_')
      )
      
          name school  marks_1  marks_2  marks_3
      0  lewis    HBS     88.0      NaN      NaN
      1   mark    HBS     28.0     19.0      NaN
      2    tom    HBS     55.0     55.0     14.0
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-10-25
        • 1970-01-01
        • 1970-01-01
        • 2017-09-24
        • 2020-08-02
        • 2021-10-26
        • 2016-06-07
        • 1970-01-01
        相关资源
        最近更新 更多