【问题标题】:Slicing with a logical (boolean) expression a Pandas Dataframe使用逻辑(布尔)表达式对 Pandas 数据框进行切片
【发布时间】:2018-12-11 16:12:39
【问题描述】:

当我尝试使用逻辑表达式对我的 Pandas 数据框进行切片时遇到异常。

我的数据有以下形式:

df
    GDP_norm    SP500_Index_deflated_norm
Year        
1980    2.121190    0.769400
1981    2.176224    0.843933
1982    2.134638    0.700833
1983    2.233525    0.829402
1984    2.395658    0.923654
1985    2.497204    0.922986
1986    2.584896    1.09770

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 1980 to 2017
Data columns (total 2 columns):
GDP_norm                     38 non-null float64
SP500_Index_deflated_norm    38 non-null float64
dtypes: float64(2)
memory usage: 912.0 bytes

命令如下:

df[((df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5) & (df['SP500_Index_deflated_norm'] > 3)) | (

   (df['GDP_norm'] >= 4.0 & df['GDP_norm'] <= 5.0) & (df['SP500_Index_deflated_norm'] < 3.5))]

错误信息如下:

TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]

【问题讨论】:

    标签: python pandas slice logical-operators boolean-expression


    【解决方案1】:

    我建议单独创建布尔掩码以获得更好的可读性和更容易的错误处理。

    m1m2 代码中缺少 (),问题在于运算符优先级:

    docs - 6.16。 &amp; 的运算符优先级比 &gt;= 更高:

    Operator                                Description
    
    lambda                                  Lambda expression
    if – else                               Conditional expression
    or                                      Boolean OR
    and                                     Boolean AND
    not x                                   Boolean NOT
    in, not in, is, is not,                 Comparisons, including membership tests    
    <, <=, >, >=, !=, ==                    and identity tests
    |                                       Bitwise OR
    ^                                       Bitwise XOR
    &                                       Bitwise AND
    
    (expressions...), [expressions...],     Binding or tuple display, list display,       
    {key: value...}, {expressions...}       dictionary display, set display
    

    m1 = (df['GDP_norm'] >=3.5) & (df['GDP_norm'] <= 4.5)
    m2 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
    
    m3 = m1 & (df['SP500_Index_deflated_norm'] > 3)
    m4 = m2 & (df['SP500_Index_deflated_norm'] < 3.5)
    
    df[m3 | m4]
    

    【讨论】:

      【解决方案2】:

      您正遭受chained comparisons 的影响。正在发生的事情是表达式 df['GDP_norm'] &gt;=3.5 &amp; df['GDP_norm'] &lt;= 4.5 被评估为:

      df['GDP_norm'] >= (3.5 & df['GDP_norm']) <= 4.5
      

      当然,这会失败,因为 float 无法与 bool 进行比较,如您的错误消息中所述。相反,使用括号来隔离每个布尔掩码并分配给变量:

      m1 = (df['GDP_norm'] >= 3.5) & (df['GDP_norm'] <= 4.5)
      m2 = df['SP500_Index_deflated_norm'] > 3
      
      m3 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
      m4 = df['SP500_Index_deflated_norm'] < 3.5
      
      res = df[(m1 & m2) | (m3 & m4)]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-11-19
        • 2020-02-03
        • 1970-01-01
        • 1970-01-01
        • 2016-05-08
        相关资源
        最近更新 更多