根据特定列值计算百分比答案

【问题标题】：calculate percentage based on specific column value根据特定列值计算百分比
【发布时间】：2020-01-31 20:00:44
【问题描述】：

我想计算每一行的百分比。下面是一个示例数据框：

    KEY  DESCR  counts
0   2    to A   1
1   2    to B   1
2   20   to C   1
3   35   to D   2
4   110  to E   4
5   110  to F   1
6   110  to G   1

百分比公式是：（计数/计数的总和。KEY 列上的指标）*100
示例：(1/2)*100

下面是一个卡住的代码，因为我尝试了很多次但没有发生。

percentage = []

for i in range(len(df)):
    percentage.append((df['counts'][i] / ...............) * 100) 

df['PERCENTAGE'] = percentage 
df

预期输出是：

    KEY  DESCR  counts  PERCENTAGE
0   2    to A   1       50
1   2    to B   1       50
2   20   to C   1       100
3   35   to A   2       100
4   110  to E   4       67
5   110  to C   1       16
6   110  to G   1       16

谁能帮我解决这个问题。谢谢

【问题讨论】：

range(len(df)) 不是 Pythonic... for i in df['count']

标签： python pandas dataframe

【解决方案1】：

如果性能很重要，请使用GroupBy.transform 和sum 并用Series.div 划分原始列，最后用Series.mul 进行倍数：

df['PERCENTAGE'] = df['counts'].div(df.groupby('KEY')['counts'].transform('sum')).mul(100)

您可以将每个值按组划分，但如果大 DataFrame 或多个组效果较差：

df['PERCENTAGE'] = df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100)

print (df)
   KEY DESCR  counts  PERCENTAGE
0    2  to A       1   50.000000
1    2  to B       1   50.000000
2   20  to C       1  100.000000
3   35  to D       2  100.000000
4  110  to E       4   66.666667
5  110  to F       1   16.666667
6  110  to G       1   16.666667

【讨论】：

.apply(math.floor) ?
@koalaok 或np.floor(df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100))
np 是更快、更 Python 还是只是一种替代方案？
@koalaok - 它更快，因为 numpy 像纯 python 循环一样更快