是否有 Python 函数可以根据条件过滤并将二进制值分配给数据集中的列？ [复制]答案

【问题标题】：Is there a Python function that can filter and assign binary value to a column in the dataset based on a condition? [duplicate]是否有 Python 函数可以根据条件过滤并将二进制值分配给数据集中的列？ [复制]
【发布时间】：2019-08-22 07:30:46
【问题描述】：

过滤我的结果数据框时出现问题。我的数据集中有一个名为 PaymentAmount 的列，其中包含数字数据并希望执行操作以分配值

1 如果data['PaymentAmount'] > 25000 和
0 如果data['PaymentAmount'] <= 25000

我尝试使用下面的

1 if data['PaymentAmount'] >= 25000 else 0

但出现以下错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-176-e368653724d0> in <module>
----> 1 1 if data['PaymentAmount'] >= 25000 else 0

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1574         raise ValueError("The truth value of a {0} is ambiguous. "
   1575                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576                          .format(self.__class__.__name__))
   1577 
   1578     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

【问题讨论】：

标签： python pandas dataframe filtering

【解决方案1】：

这里更好地使用矢量化解决方案 - 将布尔掩码转换为整数，用于 True/False 到 1/0 的映射：

data['new'] = (data['PaymentAmount'] > 25000).astype(int)

或者使用numpy.where:

data['new'] = np.where(data['PaymentAmount'] > 25000, 1, 0)

您的解决方案应该使用 lambda 函数和 apply，但速度很慢，因为在后台循环：

data['new'] = data['PaymentAmount'].apply(lambda x: 1 if  x >= 25000 else 0)

【讨论】：