根据 pandas 中的给定条件派生特征或列答案

【问题标题】：Derive a feature or column based on the given condition in pandas根据 pandas 中的给定条件派生特征或列
【发布时间】：2021-03-11 12:03:36
【问题描述】：

我有如下图所示的df

ID     Age_days    N_30     N_31_90     N_91_180      N_180_365
1      201         60       15          30            1
2      800         0        15          5             10
3      800         0        0           10            6
4      100         0        0           0             370
5      600         0        6           5             10
6      800         0        0           15            6
7      500         10       10          30            9
8      200         0        0           0             0
9      500         0        0           0             0

从上面的df我想导出一个名为Recency的列

解释：

if df['N_30'] != 0, then Recency = (30/df['N_30'])
elif df['N_31_90'] != 0 then Recency = 30 + (60/df['N_31_90'])
elif df['N_91_180'] != 0 then Recency = 90 + (90/df['N_91_180'])
elif df['N_181_365'] != 0 then Recency = 180 + (185/df['N_181_365'])
else 
  if df['age_days'] <= 365, Recency = df['age_days']
  else Recency = 366

预期输出：

ID     Limit    N_30     N_31_90     N_91_180      N_180_365    Recency
1      201      60       15          30            1            (30/60) = 0.5
2      800      0        15          5             10           30+(60/15) = 34
3      800      0        0           10            6            90+90/10 = 100
4      100      0        0           0             370          180+(185/370) = 180.5           
5      600      0        6           5             10           30+(60/6) = 36
6      800      0        0           15            6            90+(90/15) = 96
7      500      10       10          30            9            30/10 = 3
8      200      0        0           0             0            200
9      500      0        0           0             0            366

我试过下面的代码

pd.set_option("use_inf_as_na", True)
df2 = df[['N_30', 'N_31_90', 'N_91_180', 'N_180_365']]
df["Recency"] = (df2.eq(0) * [30, 60, 90, 180]).sum(1) + ([30, 60, 90, 185] / df2).bfill(1).iloc[:, 0]
df["Recency"].fillna(366)

【问题讨论】：

标签： python-3.x pandas dataframe

【解决方案1】：

使用numpy.select

import numpy as np

conditions = [df['N_30'] != 0, df['N_31_90'] != 0, df['N_91_180'] != 0, df['N_180_365'] != 0, df['Age_days'] <= 365]

choices = [(30/df['N_30']), 30 + (60/df['N_31_90']), 90 + (90/df['N_91_180']), 180 + (185/df['N_180_365']), df['Age_days']]

df['Recency']=np.select(conditions, choices, default=366)

输出：

   ID  Age_days  N_30  N_31_90  N_91_180  N_180_365  Recency
0   1       201    60       15        30          1      0.5
1   2       800     0       15         5         10     34.0
2   3       800     0        0        10          6     99.0
3   4       100     0        0         0        370    180.5
4   5       600     0        6         5         10     40.0
5   6       800     0        0        15          6     96.0
6   7       500    10       10        30          9      3.0
7   8       200     0        0         0          0    200.0
8   9       500     0        0         0          0    366.0

我假设几乎没有更正，我使用的是 N_180_365 而不是 N_181_365，你有条件但不是 DF。

【讨论】：

你能分享你的输出吗

【解决方案2】：

仅用于学习目的。

您可以尝试创建 dict 并映射元素。

def func(x):
    if (x[x['coln']]!=0):
#     if x!=np.nan:
        return (d[x['coln']](x[x['coln']]))
    elif x['Age_days']<=365:
        return x['Age_days'] 
    else:
        return 366

d = {'N_30': lambda x: (30/x), 'N_31_90': lambda x: 30 + (60/x), 'N_91_180': lambda x: 90 + (90/x), 
'N_180_365': lambda x: 180 + (185/x)}

df['recency'] = df.assign(coln = df.filter(like='N').idxmax(axis=1).reset_index(drop=True)).apply(func,axis=1)

df:

	ID	Age_days	N_30	N_31_90	N_91_180	N_180_365	recency
0	1	201	60	15	30	1	0.5
1	2	800	0	15	5	10	34.0
2	3	800	0	0	10	6	99.0
3	4	100	0	0	0	370	180.5
4	5	600	0	6	5	10	198.5
5	6	800	0	0	15	6	96.0
6	7	500	10	10	30	9	93.0
7	8	200	0	0	0	0	200.0
8	9	500	0	0	0	0	366.0

更正：

应该是：

 df.filter(like='N').replace(0,np.nan).notna().idxmax(axis=1)

修正后你会得到同样的结果。

【讨论】：