【问题标题】:Adding column in pandas with several conditions based on other columns in dataframe根据数据框中的其他列在具有多个条件的熊猫中添加列
【发布时间】:2018-05-15 12:10:19
【问题描述】:

首先,如果这已经在 StackOverflow 上的某个地方,我深表歉意,我在自己试验了一个小时后搜索了一个小时,但找不到它。我确信一定有一个优雅的(可能是基本的)解决方案。

我有以下数据框:

    Admit   Gender  Dept    Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted    Female  A   89
3   Rejected    Female  A   19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted    Female  B   17
7   Rejected    Female  B   8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted    Female  C   202
11  Rejected    Female  C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted    Female  D   131
15  Rejected    Female  D   244
16  Admitted    Male    E   53
17  Rejected    Male    E   138
18  Admitted    Female  E   94
19  Rejected    Female  E   299
20  Admitted    Male    F   22
21  Rejected    Male    F   351
22  Admitted    Female  F   24
23  Rejected    Female  F   317

我想添加一个“比例”列,它给出了每个部门按性别划分的成功/失败申请人的比例。

这样:

df.loc[0, 'Proportion'] = 512/(512+313) = 0.6206
df.loc[1, 'Proportion'] = 313/(512+313) = 0.3794
...

等等。

我尝试通过使用以下变体添加“总计”列开始:

data.groupby(['Dept', 'Gender'])[['Freq']].sum()

但我似乎无法通过原始数据帧的每一行中的值来查找此数据帧的值。

我也尝试过使用 lambda 函数,但出现“函数不可迭代”错误。

我想人们可以逐行循环它,因为它是一个小数据集,但将来当我需要做这样的事情时,这将不是一个选择。

请帮助一位新手和有抱负的数据科学家。

【问题讨论】:

    标签: python pandas dataframe conditional match


    【解决方案1】:

    对于与原始DataFrame 大小相同的系列,您可以将列除以divtransform

    data['new'] = data['Freq'].div(data.groupby(['Dept', 'Gender'])['Freq'].transform('sum'))
    

    或者使用 apply 和自定义函数:

    data['new'] = data.groupby(['Dept', 'Gender'])['Freq'].apply(lambda x: x/x.sum())
    

    print (data)
           Admit  Gender Dept  Freq       new
    0   Admitted    Male    A   512  0.620606
    1   Rejected    Male    A   313  0.379394
    2   Admitted  Female    A    89  0.824074
    3   Rejected  Female    A    19  0.175926
    4   Admitted    Male    B   353  0.630357
    5   Rejected    Male    B   207  0.369643
    6   Admitted  Female    B    17  0.680000
    7   Rejected  Female    B     8  0.320000
    8   Admitted    Male    C   120  0.369231
    9   Rejected    Male    C   205  0.630769
    10  Admitted  Female    C   202  0.340641
    11  Rejected  Female    C   391  0.659359
    12  Admitted    Male    D   138  0.330935
    13  Rejected    Male    D   279  0.669065
    14  Admitted  Female    D   131  0.349333
    15  Rejected  Female    D   244  0.650667
    16  Admitted    Male    E    53  0.277487
    17  Rejected    Male    E   138  0.722513
    18  Admitted  Female    E    94  0.239186
    19  Rejected  Female    E   299  0.760814
    20  Admitted    Male    F    22  0.058981
    21  Rejected    Male    F   351  0.941019
    22  Admitted  Female    F    24  0.070381
    23  Rejected  Female    F   317  0.929619
    

    【讨论】:

    • 非常感谢,这正是我所希望的。
    猜你喜欢
    • 2017-10-16
    • 2017-01-14
    • 1970-01-01
    • 1970-01-01
    • 2019-08-29
    • 2023-03-17
    • 2018-10-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多