【问题标题】：Logical operators for Boolean indexing in PandasPandas 中布尔索引的逻辑运算符
【发布时间】：2014-02-20 08:17:59
【问题描述】：

我在 Pandas 中使用布尔索引。

问题是为什么声明：

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常，而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

出错退出？

例子：

a = pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

【问题讨论】：

这是因为 numpy 数组和 pandas 系列使用按位运算符而不是逻辑，因为您将数组/系列中的每个元素与另一个元素进行比较。因此，在这种情况下使用逻辑运算符是没有意义的。见相关：stackoverflow.com/questions/8632033/…
在 Python 中 and != &。 Python 中的 and 运算符不能被覆盖，而 & 运算符 (__and__) 可以。因此选择在 numpy 和 pandas 中使用 &。
相关：Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

标签： python pandas dataframe boolean filtering

【解决方案1】：

当你说

(a['x']==1) and (a['y']==10)

您隐含地要求 Python 将 (a['x']==1) 和 (a['y']==10) 转换为布尔值。

NumPy 数组（长度大于 1）和 Pandas 对象（如 Series）没有布尔值——换句话说，它们引发

ValueError：数组的真值不明确。使用 a.empty、a.any() 或 a.all()。

当用作布尔值时。那是因为它是unclear when it should be True or False。如果它们的长度不为零，一些用户可能会认为它们是 True，例如 Python 列表。仅当所有其元素为 True 时，其他人可能希望它为 True。如果任何元素为 True，则其他人可能希望它为 True。

因为有太多相互矛盾的期望，NumPy 和 Pandas 的设计者拒绝猜测，而是引发 ValueError。

相反，您必须明确，通过调用empty()、all() 或any() 方法来指示您想要哪种行为。

但是，在这种情况下，您似乎不需要布尔评估，而是需要 element-wise 逻辑与。这就是& 二元运算符的作用：

(a['x']==1) & (a['y']==10)

返回一个布尔数组。

顺便说一句，作为alexpmil notes，括号是强制性的，因为& 的operator precedence 比== 高。

如果没有括号，a['x']==1 & a['y']==10 将被评估为a['x'] == (1 & a['y']) == 10，这又相当于链式比较(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)。这是Series and Series 形式的表达式。将and 与两个系列一起使用将再次触发与上述相同的ValueError。这就是括号是强制性的原因。

【讨论】：

numpy 数组确实具有此属性 if 它们的长度为 1。只有熊猫开发者（固执地）拒绝猜测：p
'&' 不是和'and' 携带相同的模棱两可的曲线吗？为什么一提到“&”，突然所有用户都同意它应该是元素方面的，而当他们看到“和”时，他们的期望会有所不同？
@Indominus: The Python language itself requires 表示表达式x and y 触发了bool(x) 和bool(y) 的评估。 Python“首先评估x；如果x为假，则返回其值；否则，评估y并返回结果值。”所以语法x and y不能用于元素逻辑，因为只能返回x或y。相比之下，x & y 触发 x.__and__(y) 并且 __and__ 方法可以定义为返回我们喜欢的任何内容。
重要提示：== 子句周围的括号是强制。 a['x']==1 & a['y']==10 返回与问题中相同的错误。
什么是“|”？

【解决方案2】：

TLDR; _{Pandas 中的逻辑运算符有&、| 和~，括号(...) 很重要！}

Python 的 and、or 和 not 逻辑运算符旨在与标量一起使用。因此，Pandas 必须做得更好，并覆盖位运算符以实现此功能的 vectorized（逐元素）版本。

因此，python 中的以下内容（exp1 和 exp2 是评估为布尔结果的表达式）...

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

...将转换为...

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

对于熊猫。

如果在执行逻辑运算的过程中得到ValueError，则需要使用括号进行分组：

(exp1) op (exp2)

例如，

(df['col1'] == x) & (df['col2'] == y)

等等。

Boolean Indexing：常见的操作是通过逻辑条件计算布尔掩码来过滤数据。 Pandas 提供三个运算符：& 用于逻辑与，| 用于逻辑或，~ 用于逻辑非。

考虑以下设置：

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

逻辑与

对于上面的df，假设您希望返回 A 5 的所有行。这是通过分别计算每个条件的掩码并对它们进行与运算来完成的。

按位重载 & 运算符
在继续之前，请注意文档的这段特别摘录，其中指出

另一个常见的操作是使用布尔向量来过滤数据。运算符是：| 代表 or，& 代表 and，~ 代表 not。 这些必须使用括号进行分组，因为默认情况下 Python 会将df.A > 2 & df.B < 3 等表达式计算为df.A > (2 & df.B) < 3，而所需的计算顺序为(df.A > 2) & (df.B < 3)。

因此，考虑到这一点，可以使用按位运算符& 实现元素逻辑与：

df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

(df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

而后面的过滤步骤很简单，

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

括号用于覆盖位运算符的默认优先顺序，位运算符的优先级高于条件运算符< 和>。请参阅 python 文档中的Operator Precedence 部分。

如果您不使用括号，则表达式的计算将不正确。例如，如果您不小心尝试了诸如

之类的操作

df['A'] < 5 & df['B'] > 5

解析为

df['A'] < (5 & df['B']) > 5

变成这样，

df['A'] < something_you_dont_want > 5

变成了（请参阅chained operator comparison 上的 python 文档），

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

变成这样，

# Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

哪个抛出

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所以，不要犯这个错误！¹

避免括号分组
修复实际上非常简单。大多数算子都有对应的 DataFrame 绑定方法。如果单个掩码是使用函数而不是条件运算符构建的，您将不再需要按括号分组来指定评估顺序：

df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

请参阅Flexible Comparisons. 部分。总而言之，我们有

╒════╤════════════╤════════════╕
│    │ Operator   │ Function   │
╞════╪════════════╪════════════╡
│  0 │ >          │ gt         │
├────┼────────────┼────────────┤
│  1 │ >=         │ ge         │
├────┼────────────┼────────────┤
│  2 │ <          │ lt         │
├────┼────────────┼────────────┤
│  3 │ <=         │ le         │
├────┼────────────┼────────────┤
│  4 │ ==         │ eq         │
├────┼────────────┼────────────┤
│  5 │ !=         │ ne         │
╘════╧════════════╧════════════╛

避免括号的另一种选择是使用DataFrame.query（或eval）：

df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

我在Dynamic Expression Evaluation in pandas using pd.eval() 中广泛记录了query 和eval。

operator.and_
允许您以功能方式执行此操作。内部调用Series.__and__，对应位运算符。

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5) 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

你通常不需要这个，但知道它很有用。

概括：np.logical_and（和 logical_and.reduce）
另一种选择是使用np.logical_and，它也不需要括号分组：

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and 是 ufunc (Universal Functions)，大多数 ufunc 都有 reduce 方法。这意味着如果您有多个与 AND 掩码，则使用 logical_and 更容易概括。例如，将m1 和m2 和m3 与& 进行AND 掩码，您必须这样做

m1 & m2 & m3

然而，一个更简单的选择是

np.logical_and.reduce([m1, m2, m3])

这很强大，因为它允许您在此之上构建更复杂的逻辑（例如，在列表推导中动态生成掩码并添加所有掩码）：

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

_{1 - 我知道我在强调这一点，但请耐心等待。这是一个非常，非常初学者常见的错误，必须非常彻底地解释。}

逻辑或

对于上面的 df，假设您希望返回 A == 3 或 B == 7 的所有行。

按位重载|

df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果您还没有阅读过，请同时阅读上面关于逻辑与的部分，所有注意事项都适用于此处。

或者，可以使用

指定此操作

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

operator.or_
在后台致电Series.__or__。

operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

np.logical_or
对于两个条件，使用logical_or：

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

对于多个掩码，使用logical_or.reduce:

np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

逻辑非

给定一个掩码，例如

mask = pd.Series([True, True, False])

如果您需要反转每个布尔值（以便最终结果为[False, False, True]），那么您可以使用以下任何一种方法。

按位~

~mask

0    False
1    False
2     True
dtype: bool

同样，表达式需要用括号括起来。

~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

这在内部调用

mask.__invert__()

0    False
1    False
2     True
dtype: bool

但不要直接使用。

operator.inv
在系列上内部调用__invert__。

operator.inv(mask)

0    False
1    False
2     True
dtype: bool

np.logical_not
这是 numpy 变体。

np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

注意，np.logical_and 可以替换为np.bitwise_and，logical_or 可以替换为bitwise_or，logical_not 可以替换为invert。

【讨论】：

@cs95 在 TLDR 中，对于 element-wise boolean OR，你提倡使用|，它相当于numpy.bitwise_or，而不是numpy.logical_or。我可以问为什么吗？ numpy.logical_or 不是专门为这个任务设计的吗？为什么要为每对元素添加按位执行的负担？
@flow2k 你能引用相关的文字吗？我找不到你指的是什么。 FWIW 我认为logical_* 是运算符的正确功能等价物。
@cs95 我指的是答案的第一行：“TLDR；Pandas 中的逻辑运算符是 &、| 和 ~”。
@flow2k 字面意思是documentation：“另一个常见的操作是使用布尔向量来过滤数据。操作符有：| for or, & for and, and ~ for not 。”
@cs95，好的，我刚刚阅读了本节，它确实使用 | 进行元素布尔运算。但对我来说，该文档更像是一个“教程”，相比之下，我觉得这些 API 参考更接近事实的来源：numpy.bitwise_or 和 numpy.logical_or - 所以我试图理解什么是此处描述。

【解决方案3】：

Pandas 中布尔索引的逻辑运算符

重要的是要意识到您不能在 pandas.Series 或 pandas.DataFrames 上使用任何 Python 逻辑运算符（and、or 或 not）（同样您不能在具有多个元素的numpy.arrays 上使用它们）。您不能使用它们的原因是因为它们在其操作数上隐式调用bool，这会引发异常，因为这些数据结构决定数组的布尔值是不明确的：

>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我确实对此进行了更广泛的介绍 in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A。

NumPy 的逻辑函数

但是，NumPy 为这些运算符提供了逐元素操作等效项，作为可用于numpy.array、pandas.Series、pandas.DataFrame 或任何其他（符合）numpy.array 子类的函数：

and 有 np.logical_and
or 有 np.logical_or
not 有 np.logical_not
numpy.logical_xor 没有 Python 等效项，但它是一个符合逻辑的 "exclusive or" 操作

因此，本质上应该使用（假设 df1 和 df2 是 Pandas DataFrames）：

np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)

布尔值的位函数和位运算符

但是，如果您有布尔 NumPy 数组、Pandas 系列或 Pandas DataFrames，您也可以使用 element-wise bitwise functions（对于布尔值，它们 - 或至少应该 - 与逻辑函数无法区分）：

按位与：np.bitwise_and 或 & 运算符
按位或：np.bitwise_or 或 | 运算符
按位非：np.invert（或别名np.bitwise_not）或~ 运算符
按位异或：np.bitwise_xor 或 ^ 运算符

通常使用运算符。然而，当与比较运算符结合使用时，必须记住将比较用括号括起来，因为按位运算符有一个 higher precedence than the comparison operators:

(df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10

这可能很烦人，因为 Python 逻辑运算符的优先级低于比较运算符，因此您通常编写 a < 10 and b > 10（其中 a 和 b 是例如简单整数）并且不需要括号.

逻辑运算和位运算的区别（非布尔值）

强调位和逻辑操作仅对布尔 NumPy 数组（以及布尔系列和数据帧）是等效的，这一点非常重要。如果这些不包含布尔值，那么操作将给出不同的结果。我将包含使用 NumPy 数组的示例，但结果与 pandas 数据结构相似：

>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

由于 NumPy（以及类似的 Pandas）对布尔 (Boolean or “mask” index arrays) 和整数 (Index arrays) 索引执行不同的操作，因此索引的结果也会有所不同：

>>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

汇总表

Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
       and       |  np.logical_and        | np.bitwise_and         |        &
-------------------------------------------------------------------------------------
       or        |  np.logical_or         | np.bitwise_or          |        |
-------------------------------------------------------------------------------------
                 |  np.logical_xor        | np.bitwise_xor         |        ^
-------------------------------------------------------------------------------------
       not       |  np.logical_not        | np.invert              |        ~

逻辑运算符不适用于 NumPy 数组、Pandas Series 和 pandas DataFrame。其他人处理这些数据结构（和纯 Python 对象）并按元素工作。但是，请注意纯 Python bools 上的按位反转，因为在这种情况下，布尔值将被解释为整数（例如，~False 返回 -1，~True 返回 -2）。

【讨论】：

TLDR; Pandas 中的逻辑运算符有&amp;、| 和~，括号(...) 很重要！

逻辑与

逻辑或

逻辑非

NumPy 的逻辑函数

布尔值的位函数和位运算符

逻辑运算和位运算的区别（非布尔值）

汇总表

TLDR; _{Pandas 中的逻辑运算符有&、| 和~，括号(...) 很重要！}