Pandas - 优化百分位数计算答案

【问题标题】：Pandas - optimize percentile calculationPandas - 优化百分位数计算
【发布时间】：2021-06-16 12:54:32
【问题描述】：

我有一个这样的数据集：

id     type     score
a1     ball       15
a2     ball       12
a1     pencil     10
a3     ball       8
a2     pencil     6

我想找出每个 id 的每种类型的排名。因为我稍后会将排名转换为百分位数，所以我更喜欢使用 rank。

输出应该是这样的：

id     type     score rank
a1     ball       15   1
a2     ball       12   2
a1     pencil     10   1
a3     ball       8    3
a2     pencil     6    2

到目前为止，我所做的是获得一组独特的 type 并用它迭代它：

test_data['percentile_from_all'] = 0
for i in unique_type_list:
    loc_i = test_data['type']==i
    percentiles = test_data.loc[loc_i,['score']].rank(pct = True)*100
    test_data.loc[loc_i,'percentile_from_all'] = percentiles.values

这种方法适用于小型数据集，但即使是 10k 次迭代，它也会变得太慢。有没有办法像apply 这样同时进行？

谢谢！

【问题讨论】：

标签： python pandas rank

【解决方案1】：

检查groupby

df['rnk'] = df.groupby('type').score.rank(ascending=False)
Out[67]: 
0    1.0
1    2.0
2    1.0
3    3.0
4    2.0
Name: score, dtype: float64

【讨论】：

太棒了！我试图让它与应用 d 地图一起工作，但 groupby 好得多！谢谢