【问题标题】:Overall sum by groupby pandasgroupby pandas 的总和
【发布时间】:2019-11-21 16:19:05
【问题描述】:

我有一个如下所示的数据框,它是整个城市的区域使用情况,比如班加罗尔。

Sector   Plot   Usage        Status             Area
A        1      Villa        Constructed        40
A        2      Residential  Constructed        50
A        3      Substation   Not_Constructed    120
A        4      Villa        Not_Constructed    60
A        5      Residential  Not_Constructed    30
A        6      Substation   Constructed        100
B        1      Villa        Constructed        80
B        2      Residential  Constructed        60
B        3      Substation   Not_Constructed    40
B        4      Villa        Not_Constructed    80
B        5      Residential  Not_Constructed    100
B        6      Substation   Constructed        40

班加罗尔由 A 和 B 两个扇区组成。

从上面我想计算班加罗尔的总面积及其使用分布。

预期输出:

City           Total_Area  %_Villa  %_Resid  %_Substation  %_Constructed  %_Not_Constructed
Bangalore(A+B) 800         32.5     30       37.5          46.25          53.75

【问题讨论】:

    标签: pandas pandas-groupby


    【解决方案1】:

    我认为您需要在应用解决方案之前将标量值设置为列city(如果只有扇区AB):

    df['Sector'] = 'Bangalore(A+B)'
    #aggregate sum per 2 columns Sector and Usage
    df1 = df.groupby(['Sector', 'Usage'])['Area'].sum()
    #percentage by division of total per Sector
    df1 = df1.div(df1.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
    #aggregate sum per 2 columns Sector and Status
    df2 = df.groupby(['Sector', 'Status'])['Area'].sum()
    df2 = df2.div(df2.sum(level=0), level=0).unstack(fill_value=0).mul(100).add_prefix('%_')
    #total Area per Sector
    s = df.groupby('Sector')['Area'].sum().rename('Total_area')
    #join all together
    dfA = pd.concat([s, df1, df2], axis=1).reset_index()
    print (dfA)
               Sector  Total_area  %_Residential  %_Substation  %_Villa  \
    0  Bangalore(A+B)         800           30.0          37.5     32.5   
    
       %_Constructed  %_Not_Constructed  
    0          46.25              53.75  
    

    【讨论】:

      【解决方案2】:

      简单的数据透视表可以提供帮助!

      1. One Line Pandas 解决方案:完成了 80% 的工作

      pv = df.pivot_table(values='Area',aggfunc=np.sum,index=['Status'],columns=['Usage'],margins=True,margins_name='Total',fill_value=0).unstack()

      2。现在格式化为 %:90% 工作完成

      ans = pd.DataFrame([[pv['Villa']['Total']/pv['Total']['Total'].astype('float'),pv['Resid']['Total']/pv['Total']['Total'].astype('float'),pv['Substation']['Total']/pv['Total']['Total'].astype('float'),pv['Total']['Constructed']/pv['Total']['Total'].astype('float'),pv['Total']['Not_Constructed']/pv['Total']['Total'].astype('float')]]).round(2)*100

      3.添加总列:已完成 99% 的工作

      ans['Total'] = pv['Total']['Total']

      4.重命名列并按预期顺序排列:完成!

      ans.columns=['%_Villa','%_Resid','%_Substation','%_Constructed','%_Not_Constructed','Total']

      ans = ans[['Total',''%_Villa','%_Resid','%_Substation','%_Constructed','%_Not_Constructed']]

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2014-05-04
        • 1970-01-01
        • 2019-06-08
        • 2019-08-11
        • 2020-02-25
        • 1970-01-01
        相关资源
        最近更新 更多