【问题标题】:Iterating over rules using if else使用 if else 迭代规则
【发布时间】:2019-04-16 20:57:12
【问题描述】:

我有一个主表,其中包含每种产品的费率及其各自的包裹和风险类别。

df = pd.DataFrame({'package': {0: 'basic', 1: 'medium', 2: 'premium', 3:'basic', 4:'medium', 5:'premium'},
   'risk_bin': {0: 'good/mid', 1: 'good/mid', 2: 'good/mid', 3:'bad', 4:'bad',5:'bad'},
   'A': {0:0.012,1:0.022,2:0.032,3:0.05,4:0.06,5:0.07},
   'B': {0:0.013,1:0.023,2:0.033,3:0.051,4:0.061,5:0.071},
   'C': {0:0.014,1:0.024,2:0.034,3:0.052,4:0.062,5:0.072},
   'D': {0:0.015,1:0.025,2:0.035,3:0.053,4:0.063,5:0.073}})
df = df[df.columns[[4,5,0,1,2,3]]]

在第二个表中,我获得了用户选项,用户将能够根据其他产品费率为这些产品创建任意数量的规则。而这些规则只能适用于特定的包裹或风险箱。

因此,对于下面的示例,产品 B 将具有产品 A 的费率加上 5% 仅适用于基本包装和良好/中等风险。对于所有包裹,产品 C 的税率为 D 加 10%,仅适用于不良风险。

rules = pd.DataFrame({'rule': {0: '1', 1: '2'},
   'product1': {0: 'B', 1: 'C'},
   'relantioship': {0:'=',1:'='},
   'product2': {0:'A',1:'D'},
   'symbol': {0:'+',1:'-'},
   'value': {0:0.05,1:0.10},
   'package':{0:'basic',1:'all'},
   'risk': {0:'good/mid', 1:'bad'}})
 rules = rules[rules.columns[[5,1,3,2,6,7,0,4]]]

因为我可以拥有用户想要的任意数量的规则,所以我需要创建一个循环,然后将值相应地传递给定义的关系。

df2 = df.reset_index()

rules_nc = rules['rule'].get_values()
nc_cnt = rules_nc.size     

for i in range(nc_cnt):
    if pd.isnull(rules['rule'][i]):
        break
    product_1 = rules['product1'][i]
    product_2 = rules['product2'][i]
    sym = str(rules['symbol'][i])
    val = rules['value'][i]
    pack= rules['package'][i]
    risk = rules['risk'][i]        

if (df2['risk_bin']==risk) & (df2['package']==pack):
        if sym=='+':
            df2[product_1] = df2[product_2] + val
        if sym=='-':
            df2[product_1] = df2[product_2] - val    
else:
     df2[product_1] =  df2[product_1]

当我这样做时,我收到以下错误:

 The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

这是我期望这组规则的输出。

    results = pd.DataFrame({'package': {0: 'basic', 1: 'medium', 2: 'premium', 3:'basic', 4:'medium', 5:'premium'},
   'risk_bin': {0: 'good/mid', 1: 'good/mid', 2: 'good/mid', 3:'bad', 4:'bad',5:'bad'},
   'A': {0:0.012,1:0.022,2:0.032,3:0.05,4:0.06,5:0.07},
   'B': {0:0.062,1:0.023,2:0.033,3:0.1,4:0.061,5:0.071},
   'C': {0:0.014,1:0.024,2:0.034,3:0.153,4:0.163,5:0.173},
   'D': {0:0.015,1:0.025,2:0.035,3:0.053,4:0.063,5:0.073}})

results = results[results.columns[[4,5,0,1,2,3]]]

你能帮帮我吗? 非常感谢!

【问题讨论】:

  • 根据您提供的输入,您能否提供您期望的输出?这可能是 pandas.merge() 的工作,而不是迭代:pandas.pydata.org/pandas-docs/stable/generated/…
  • @smj 规则和规则数量可以根据用户而改变,我想如何使用 pandas.merge() 来解决这个问题?能给我举个例子吗?我刚刚添加了预期的输出。谢谢
  • 你能试着用文字解释一下你的循环试图做什么吗?我不太清楚。
  • @user32185 只是用文字添加的。谢谢
  • 如果您愿意,我可以分享一个适用于每批规则的解决方案。但如果规则不明确,可能会导致奇怪的结果。

标签: python pandas loops if-statement


【解决方案1】:

这是一种可能的解决方案。不理想,因为它使用apply,它比循环快,但不如矢量解决方案快。我在rules 中将risk 重命名为risk_bin

import pandas as pd

df = pd.DataFrame({'package': {0: 'basic', 1: 'medium', 2: 'premium', 3:'basic', 4:'medium', 5:'premium'},
   'risk_bin': {0: 'good/mid', 1: 'good/mid', 2: 'good/mid', 3:'bad', 4:'bad',5:'bad'},
   'A': {0:0.012,1:0.022,2:0.032,3:0.05,4:0.06,5:0.07},
   'B': {0:0.013,1:0.023,2:0.033,3:0.051,4:0.061,5:0.071},
   'C': {0:0.014,1:0.024,2:0.034,3:0.052,4:0.062,5:0.072},
   'D': {0:0.015,1:0.025,2:0.035,3:0.053,4:0.063,5:0.073}})
df = df[df.columns[[4,5,0,1,2,3]]]

rules = pd.DataFrame({'rule': {0: '1', 1: '2'},
   'product1': {0: 'B', 1: 'C'},
   'relantioship': {0:'=',1:'='},
   'product2': {0:'A',1:'D'},
   'symbol': {0:'+',1:'-'},
   'value': {0:0.05,1:0.10},
   'package':{0:'basic',1:'all'},
   'risk_bin': {0:'good/mid', 1:'bad'}})
rules = rules[rules.columns[[5,1,3,2,6,7,0,4]]]

def fun(row):
    if row["symbol"] == "+":
        row[row["product1"]] = row[row["product2"]] + row["value"]
    else:
        row[row["product1"]] = row[row["product2"]] - row["value"]
    return row

# here you look for all the rows where rules match with the given columns
df1 = pd.merge(df.reset_index(), rules, on=["package", "risk_bin"])
# here you what a rule for `all` package
df2 = pd.merge(df.reset_index(),
               rules[rules["package"]=='all'].loc[:, rules.columns != "package"],
               on=["risk_bin"])
# now you apply the function to both df
df1 = df1.apply(lambda x: fun(x), axis=1)
df2 = df2.apply(lambda x: fun(x), axis=1)

#select the indices in df1 and df2
bad_idx = df.index.isin(df1["index"].tolist()+df2["index"].tolist())

#concat all together
res = pd.concat([df1[df.columns], df2[df.columns], df[~bad_idx]],ignore_index=True)

【讨论】:

  • 非常感谢@user32185
猜你喜欢
  • 2018-11-15
  • 2021-01-31
  • 1970-01-01
  • 2019-05-30
  • 2023-03-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-02-19
相关资源
最近更新 更多