【问题标题】:pandas logical and operator with and without brackets produces different results [duplicate]带和不带括号的熊猫逻辑和运算符产生不同的结果[重复]
【发布时间】:2017-07-09 08:17:13
【问题描述】:

我刚刚注意到这一点:

df[df.condition1 & df.condition2]
df[(df.condition1) & (df.condition2)]

为什么这两行的输出不同?


我无法分享确切的数据,但我会尽量提供详细信息:

df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect
df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly 

感谢@jezrael 和@ayhan,这就是发生的事情,让我使用@jezael 提供的示例:

df = pd.DataFrame({'col1':[True, False, False, False],
                   'col2':[4, np.nan, np.nan, 1]})

print (df)
    col1  col2
0   True   4.0
1  False   NaN
2  False   NaN
3  False   1.0

如果我们看一下第 3 行:

    col1  col2
3  False   1.0

以及我写条件的方式:

df.col1 == False & df.col2.isnull() # is equivalent to False == False & False

因为&符号的优先级高于==,所以不带括号的False == False & False相当于:

False == (False & False)
print(False == (False & False)) # prints True

带括号:

print((False == False) & False) # prints False

我认为用数字来说明这个问题会更容易一些:

print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False
print(5 == (5 & 1)) # prints False, same reason as above
print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1

所以吸取的教训:总是加括号!!!

我希望我可以将答案点分成@jezrael 和@ayhan :(

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    df[condition1 & condition2]df[(condition1) & (condition2)] 之间没有区别。当您编写表达式并且运算符 & 优先时,就会出现差异:

    df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))    
    df
    Out: 
       a  b  c
    0  5  0  3
    1  3  7  9
    2  3  5  2
    3  4  7  6
    4  8  8  1
    
    condition1 = df['a'] > 3
    condition2 = df['b'] < 5
    
    df[condition1 & condition2]
    Out: 
       a  b  c
    0  5  0  3
    
    df[(condition1) & (condition2)]
    Out: 
       a  b  c
    0  5  0  3
    

    但是,如果你这样输入,你会看到一个错误:

    df[df['a'] > 3 & df['b'] < 5]
    Traceback (most recent call last):
    
      File "<ipython-input-7-9d4fd21246ca>", line 1, in <module>
        df[df['a'] > 3 & df['b'] < 5]
    
      File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__
        .format(self.__class__.__name__))
    
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

    这是因为首先评估 3 &amp; df['b'](这对应于您的示例中的 False &amp; df.col2.isnull())。所以你需要将括号中的条件分组:

    df[(df['a'] > 3) & (df['b'] < 5)]
    Out[8]: 
       a  b  c
    0  5  0  3
    

    【讨论】:

      【解决方案2】:

      你是对的,这是不同的,我认为运营商的优先级存在问题 - 检查docs

      df = pd.DataFrame({'col1':[True, False, False, False],
                         'col2':[4, np.nan, np.nan, 1]})
      
      print (df)
          col1  col2
      0   True   4.0
      1  False   NaN
      2  False   NaN
      3  False   1.0
      
      # operator & precedence
      print (df[df.col1 == False & df.col2.isnull()])
          col1  col2
      1  False   NaN
      2  False   NaN
      3  False   1.0
      
      # operator == precedence bacause in brackets
      print (df[(df.col1 == False) & (df.col2.isnull())])
          col1  col2
      1  False   NaN
      2  False   NaN
      

      似乎我在docs - 6.16 中找到了它。 &amp; 的运算符优先级比 == 更高:

      Operator                                Description
      
      lambda                                  Lambda expression
      if – else                               Conditional expression
      or                                      Boolean OR
      and                                     Boolean AND
      not x                                   Boolean NOT
      in, not in, is, is not,                 Comparisons, including membership tests    
      <, <=, >, >=, !=, ==                    and identity tests
      |                                       Bitwise OR
      ^                                       Bitwise XOR
      &                                       Bitwise AND
      
      (expressions...), [expressions...],     Binding or tuple display, list display,       
      {key: value...}, {expressions...}       dictionary display, set display
      

      【讨论】:

      • 那么哪个算子优先,我还是一头雾水。
      • The following table summarizes the operator precedence in Python, from lowest precedence (least binding) to highest precedence (most binding). 所以我认为&amp; 的优先级高于==。请参阅更新的 OP
      猜你喜欢
      • 2012-10-01
      • 2014-03-21
      • 2016-01-17
      • 2020-12-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多