计算 Pandas 中特定列和每一行的非零百分比答案

【问题标题】：Calculate nonzeros percentage for specific columns and each row in Pandas计算 Pandas 中特定列和每一行的非零百分比
【发布时间】：2019-03-06 11:09:07
【问题描述】：

如果我有以下数据框：

   df = pd.DataFrame({'name':['john','mary','peter','jeff','bill','lisa','jose'], 'gender':['M','F','M','M','M','F','M'],'state':['california','dc','california','dc','california','texas','texas'],'num_children':[2,0,0,3,2,1,4],'num_pets':[5,1,0,5,2,2,3]})

    name gender       state      num_children  num_pets
0   john      M  california             2         5
1   mary      F          dc             0         1
2  peter      M  california             0         0
3   jeff      M          dc             3         5
4   bill      M  california             2         2
5   lisa      F       texas             1         2
6   jose      M       texas             4         3

我想创建一个新的行和列pct. 来获取num_children 和num_pets 列中零值的百分比预期输出：

    name gender       state      num_children  num_pets   pct.
0   pct.                              28.6%     14.3%     
1   john      M  california             2         5        0% 
2   mary      F          dc             0         1       50%
3  peter      M  california             0         0      100%
4   jeff      M          dc             3         5        0% 
5   bill      M  california             2         2        0%
6   lisa      F       texas             1         2        0%
7   jose      M       texas             4         3        0%

我已经为目标列计算了每行中零的百分比：

df['pct'] = df[['num_children', 'num_pets']].astype(bool).sum(axis=1)/2
df['pct.'] = 1-df['pct']
del df['pct']
df['pct.'] = pd.Series(["{0:.0f}%".format(val * 100) for val in df['pct.']], index = df.index)

    name gender       state  num_children  num_pets  pct.
0   john      M  california             2         5    0%
1   mary      F          dc             0         1   50%
2  peter      M  california             0         0  100%
3   jeff      M          dc             3         5    0%
4   bill      M  california             2         2    0%
5   lisa      F       texas             1         2    0%
6   jose      M       texas             4         3    0%

但我不知道如何将下面的结果插入pct 的行。正如预期的输出，请帮助我以更 Pythonic 的方式获得预期的结果。谢谢。

df[['num_children', 'num_pets']].astype(bool).sum(axis=0)/len(df.num_children)
Out[153]: 
num_children    0.714286
num_pets        0.857143
dtype: float64

更新：同样的事情，但用于计算总和，非常感谢@jezrael：

df['sums'] = df[['num_children', 'num_pets']].sum(axis=1)
df1 = (df[['num_children', 'num_pets']].sum()
                                       .to_frame()
                                       .T
                                       .assign(name='sums'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state  num_children  num_pets sums
0   sums                               12        18    
1   john      M  california             2         5   7
2   mary      F          dc             0         1   1
3  peter      M  california             0         0   0
4   jeff      M          dc             3         5   8
5   bill      M  california             2         2   4
6   lisa      F       texas             1         2   3
7   jose      M       texas             4         3   7

【问题讨论】：

标签： python pandas

【解决方案1】：

您可以通过将0 值与DataFrame.eq 进行比较，将mean 与布尔掩码一起使用，因为根据定义，sum/len=mean 乘以100，并与apply 相加：

s = df[['num_children', 'num_pets']].eq(0).mean(axis=1)
df['pct'] = s.mul(100).apply("{0:.0f}%".format)

为第一行创建新的 DataFrame 与 original 和 concat 等相同的列：

df1 = (df[['num_children', 'num_pets']].eq(0)
                                       .mean()
                                       .mul(100)
                                       .apply("{0:.1f}%".format)
                                       .to_frame()
                                       .T
                                       .assign(name='pct.'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df], 
                ignore_index=True, sort=False)
print (df)
    name gender       state num_children num_pets   pct
0   pct.                           28.6%    14.3%      
1   john      M  california            2        5    0%
2   mary      F          dc            0        1   50%
3  peter      M  california            0        0  100%
4   jeff      M          dc            3        5    0%
5   bill      M  california            2        2    0%
6   lisa      F       texas            1        2    0%
7   jose      M       texas            4        3    0%

【讨论】：

可以通过这种方式添加这行百分比，但是为这些百分比使用单独的数据框不是更理想吗？
@suicidalteddy - 是的，这是可能的，但这里有必要进行一些数据处理，因此使用 Series 更容易。但这是可能的解决方案。
感谢您一如既往的出色解决方案。请问，如果我想计算sums而不是percentages呢？
@ahbon - 你认为将s = df[['num_children', 'num_pets']].eq(0).mean(axis=1) 更改为s = df[['num_children', 'num_pets']].sum(axis=1) 然后df['pct'] = s 吗？
我编辑了您的代码，它适用于计算总和。请检查我更新的问题。谢谢。