【问题标题】:np.where condition is not getting satisfiednp.where 条件未得到满足
【发布时间】:2021-05-05 09:09:29
【问题描述】:

在下面的代码行中,我得到如下所示的错误。

d3["WOE"] = np.where(((d3.DIST_EVENT==0) | (d3.DIST_NON_EVENT ==0)) ,np.nan ,np.log(d3.DIST_EVENT/d3.DIST_NON_EVENT))

如果分子或分母为 0,则 np.nan 的条件应满足且 d3["WOE"] 应为 nan。为什么会出现以下错误?

---------------------------------------------------------------------------
FloatingPointError                        Traceback (most recent call last)
<ipython-input-56-a9b015683238> in <module>
----> 1 final_iv, IV = data_vars(df_leads_short,df_leads_short.close_flag)
      2 IV.sort_values('IV')

<ipython-input-55-5530ad13fa5a> in data_vars(df1, target)
    122                 count = count + 1
    123             else:
--> 124                 conv = char_bin(target, df1[i])
    125                 conv["VAR_NAME"] = i
    126                 count = count + 1

<ipython-input-55-5530ad13fa5a> in char_bin(Y, X)
     92     d3["DIST_EVENT"] = d3.EVENT/d3.sum().EVENT
     93     d3["DIST_NON_EVENT"] = d3.NONEVENT/d3.sum().NONEVENT
---> 94     d3["WOE"] = np.where(((d3.DIST_EVENT==0) | (d3.DIST_NON_EVENT ==0)) ,np.nan ,np.log(d3.DIST_EVENT/d3.DIST_NON_EVENT))
     95     #d3["WOE"] = np.log(d3.DIST_EVENT/d3.DIST_NON_EVENT)
     96     d3["IV"] = np.where((d3.DIST_EVENT==0) | (d3.DIST_NON_EVENT ==0 ),np.nan ,(d3.DIST_EVENT-d3.DIST_NON_EVENT)*np.log(d3.DIST_EVENT/d3.DIST_NON_EVENT))

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
   1934         self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
   1935     ):
-> 1936         return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
   1937 
   1938     # ideally we would define this to avoid the getattr checks, but

/opt/conda/lib/python3.7/site-packages/pandas/core/arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
    356         # ufunc(series, ...)
    357         inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
--> 358         result = getattr(ufunc, method)(*inputs, **kwargs)
    359     else:
    360         # ufunc(dataframe)

FloatingPointError: divide by zero encountered in log

【问题讨论】:

  • where 是一个 python 函数。它的参数在传入之前会进行完整的评估。

标签: pandas numpy


【解决方案1】:

我们可以的

cond = ((d3.DIST_EVENT==0) | (d3.DIST_NON_EVENT ==0))
d3.loc[~cond,"WOE"] = np.log(d3.loc[~cond,"DIST_EVENT"]/d3.loc[~cond,"DIST_NON_EVENT"]))

由于 np.where 仍然需要计算 np.log(d3.DIST_EVENT/d3.DIST_NON_EVENT) 仍然会产生相同的错误。np.where 只是选择。

【讨论】:

  • 我明白了。我没有意识到这一点。您的解决方案完美运行。谢谢
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-10-05
  • 1970-01-01
  • 1970-01-01
  • 2023-03-31
  • 2019-07-19
相关资源
最近更新 更多