【问题标题】:Summing certain columns with multiindex column dataframe将某些列与多索引列数据框相加
【发布时间】:2019-02-02 14:00:16
【问题描述】:

我有一个从数据透视表创建的数据框,看起来类似于:

            import pandas as pd
            d = {
                    ('company1', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0,
                    'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
                    ('company1', 'False Positive'): {'April- 2012': 0.0, 'April- 2013'  544.0, 
                    'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0}, 
                    ('company1', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 
                    'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},
                    ('company2', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 
                    'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
                    ('company2', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0, 
                    'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
                    ('company2', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0,
                    'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0}
                }
            df = pd.DataFrame(d)

            company1    company2
            FN  FP  TP  FN  FP  TP
            April- 2012     112 0   0   112 0   0
            April- 2013     370 544 140 370 544 140
            April- 2014     499 50  24  499 50  24
            August- 2012    431 0   0   431 0   0
            August- 2013    496 0   0   496 0   0
            August- 2014    221 426 77  221 426 77

我希望迭代多索引列的上层以为每个公司创建一个总和列:

FSUM = FN + FP

SUM = FN + FP + TP

                            company1               company2
                            FN  FP  TP  FSUM  SUM  FN   FP  TP   FSUM  SUM
            April- 2012     112 0   0   112  112   112  0   0    112   112
            April- 2013     370 544 140 914  1054  370  544 140  914   1054
            April- 2014     499 50  24  549  573   499  50  24   549   573
            August- 2012    431 0   0   431  431   431  0   0    431   431
            August- 2013    496 0   0   496  496   496  0   0    496   496
            August- 2014    221 426 77  647  724   221  426 77   647   724

我事先不知道公司名称,所以需要循环

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    您可以通过使用一些.stacks 和.unstacks 重新组合事物来更轻松:

    n [96]: df = df.unstack().unstack(1)
    
    In [97]: df
    Out[97]:
                           False Negative  False Positive  True Positive
    company1 April- 2012            112.0             0.0            0.0
             April- 2013            370.0           544.0          140.0
             April- 2014            499.0            50.0           24.0
             August- 2012           431.0             0.0            0.0
             August- 2013           496.0             0.0            0.0
             August- 2014           221.0           426.0           77.0
    company2 April- 2012            112.0             0.0            0.0
             April- 2013            370.0           544.0          140.0
             April- 2014            499.0            50.0           24.0
             August- 2012           431.0             0.0            0.0
             August- 2013           496.0             0.0            0.0
             August- 2014           221.0           426.0           77.0
    
    In [98]: df['SUM'] = df.sum(axis=1)
    
    In [99]: df['FSUM'] = df['False Negative'] + df['False Positive']
    
    In [100]: df = df.stack().unstack([0,2])
    
    In [101]: df
    Out[101]:
                       company1                                              \
                 False Negative False Positive True Positive     SUM   FSUM
    April- 2012           112.0            0.0           0.0   112.0  112.0
    April- 2013           370.0          544.0         140.0  1054.0  914.0
    April- 2014           499.0           50.0          24.0   573.0  549.0
    August- 2012          431.0            0.0           0.0   431.0  431.0
    August- 2013          496.0            0.0           0.0   496.0  496.0
    August- 2014          221.0          426.0          77.0   724.0  647.0
    
                       company2
                 False Negative False Positive True Positive     SUM   FSUM
    April- 2012           112.0            0.0           0.0   112.0  112.0
    April- 2013           370.0          544.0         140.0  1054.0  914.0
    April- 2014           499.0           50.0          24.0   573.0  549.0
    August- 2012          431.0            0.0           0.0   431.0  431.0
    August- 2013          496.0            0.0           0.0   496.0  496.0
    August- 2014          221.0          426.0          77.0   724.0  647.0
    

    【讨论】:

      【解决方案2】:

      一种方法是使用 sum 和 level 命令,然后使用 pd.concat,最后是 sort_index:

      pd.concat([df,
                 df.loc(axis=1)[:,['False Negative','False Positive']].sum(level=0, axis=1).assign(indx2 = 'FSUM').set_index('indx2', append=True).unstack(),
                 df.sum(level=0, axis=1).assign(indx2='SUM').set_index('indx2', append=True).unstack()],
                axis=1).sort_index(axis=1)
      

      输出:

                   company1                                                      \
                       FSUM False Negative False Positive     SUM True Positive   
      April- 2012     112.0          112.0            0.0   112.0           0.0   
      April- 2013     914.0          370.0          544.0  1054.0         140.0   
      April- 2014     549.0          499.0           50.0   573.0          24.0   
      August- 2012    431.0          431.0            0.0   431.0           0.0   
      August- 2013    496.0          496.0            0.0   496.0           0.0   
      August- 2014    647.0          221.0          426.0   724.0          77.0   
      
                   company2                                                      
                       FSUM False Negative False Positive     SUM True Positive  
      April- 2012     112.0          112.0            0.0   112.0           0.0  
      April- 2013     914.0          370.0          544.0  1054.0         140.0  
      April- 2014     549.0          499.0           50.0   573.0          24.0  
      August- 2012    431.0          431.0            0.0   431.0           0.0  
      August- 2013    496.0          496.0            0.0   496.0           0.0  
      August- 2014    647.0          221.0          426.0   724.0          77.0  
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2020-03-12
        • 2021-10-10
        • 2015-04-07
        • 1970-01-01
        • 1970-01-01
        • 2013-04-11
        • 2020-12-13
        相关资源
        最近更新 更多