在多列上对熊猫数据框行进行排名答案

【问题标题】：Ranking pandas data frame rows on multiple columns在多列上对熊猫数据框行进行排名
【发布时间】：2020-12-02 05:52:01
【问题描述】：

我是熊猫新手。我正在尝试了解如何在 Pandas 中做一些事情，就像我在 SQL 中所做的那样 -

我有一张像 -

的桌子

Account Company Blotter
112233  10      62
233445  12      62
233445  10      66
343454  21      66
343454  21      64
768876  25      54

在 SQL 中，如果给定帐户出现在多行中，我会使用 rank()，如果我想优先考虑某家公司，我会写一个 case 语句来强制优先考虑该公司。我还可以使用 Blotter 列作为附加的排名参数。例如

rank() over(
    partition by ACCOUNT 
    order by case 
                when COMPANY='12' then 0 
                when COMPANY='21' then 1 
                else COMPANY 
             end, 
             case 
                when BLOTTER ='66' then 0 
                else BLOTTER 
             end
)

预期输出：

   Account  Company  Blotter  rank
0   112233       10       62     1
1   233445       12       62     1
2   233445       10       66     2
3   343454       21       66     1
4   343454       21       64     2
5   768876       25       54     1

【问题讨论】：

请发布您的预期输出
请发布您的预期输出。

标签： pandas ranking

【解决方案1】：

你可能想试试这个：

# recompute the sort criteria for company and blotter
ser_sort_company= df['Company'].map({12: 0, 21: 1}).fillna(df['Company'])
ser_sort_blotter= df['Blotter'].map({12: 0, 21: 1}).fillna(df['Blotter'])
df['rank']= (df
     # temporarily create sort columns
     .assign(sort_company=ser_sort_company)
     .assign(sort_blotter=ser_sort_blotter)
     # temporarily sort the result
     # this replaces the ORDER BY part
     .sort_values(['sort_company', 'sort_blotter'])
     # group by Account to replace the PARTITION BY part
     .groupby('Account')
     # get the position of the record in the group (RANK part)
     .transform('cumcount') + 1
)

df

计算结果为：

   Account  Company  Blotter  rank
0   112233       10       62     1
1   233445       12       62     1
2   233445       10       66     2
3   343454       21       66     2
4   343454       21       64     1
5   768876       25       54     1

【讨论】：

非常感谢@jottbe。非常锋利的解决方案，完美！
很高兴我能帮上忙。如果你喜欢，你可以把它标记为答案。

【解决方案2】：

DataFrame 的 pandas sort_values 方法可能是您正在寻找的。p>

import pandas as pd

data = [
[112233, 10, 62],
[233445, 12, 62],
[233445, 10, 66],
[343454, 21, 66],
[343454, 21, 64],
[768876, 25, 54]]

df = pd.DataFrame(data, columns=['Account', 'Company', 'Blotter'])
df

   Account  Company Blotter
0   112233  10  62
1   233445  12  62
2   233445  10  66
3   343454  21  66
4   343454  21  64
5   768876  25  54

df_shuffled = df.sample(frac=1, random_state=0)   # shuffle the rows
df_shuffled

    Account Company Blotter
5   768876  25  54
2   233445  10  66
1   233445  12  62
3   343454  21  66
0   112233  10  62
4   343454  21  64

df_shuffled.sort_values(by=['Account', 'Company', 'Blotter'], 
                        ascending=[True, False, False])

    Account Company Blotter
0   112233  10  62
1   233445  12  62
2   233445  10  66
3   343454  21  66
4   343454  21  64
5   768876  25  54

【讨论】：