【问题标题】:Python DataFrames For Loop with If Statement not working带有If语句的Python DataFrames For Loop不起作用
【发布时间】:2017-02-21 19:59:59
【问题描述】:

我有一个名为 ES_15M_Summary 的 DataFrame,在标题为 ES_15M_Summary['Rolling_OLS_Coefficient'] 的列中包含系数/beta,如下所示:

如果上图中的列 ('Rolling_OLS_Coefficient') 的值大于 0.08,我希望标题为“Long”的新列是二进制“Y”。如果另一列中的值小于 0.08,我希望该值为 'NaN' 或只是 'N'(都可以)。

所以我正在编写一个 for 循环来运行列。首先,我创建了一个名为“Long”的新列并将其设置为 NaN:

ES_15M_Summary['Long'] = np.nan

然后我做了以下 For 循环:

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary['Long'] = 'Y'
    else:
        ES_15M_Summary['Long'] = 'NaN'

我得到错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

...指的是上面显示的 if 语句行 (if.​​..>.08:)。我不确定为什么会收到此错误或 for 循环有什么问题。任何帮助表示赞赏。

【问题讨论】:

    标签: python pandas for-loop dataframe


    【解决方案1】:

    我认为最好使用numpy.where:

    mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
    ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
    

    示例:

    ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
    print (ES_15M_Summary)
       Rolling_OLS_Coefficient
    0                     0.07
    1                     0.01
    2                     0.09
    
    mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
    ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
    print (ES_15M_Summary)
       Rolling_OLS_Coefficient Long
    0                     0.07    N
    1                     0.01    N
    2                     0.09    Y
    

    循环,非常慢的解决方案:

    for index, row in ES_15M_Summary.iterrows():
        if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
            ES_15M_Summary.loc[index,'Long'] = 'Y'
        else:
            ES_15M_Summary.loc[index,'Long'] = 'N'
    print (ES_15M_Summary)
       Rolling_OLS_Coefficient Long
    0                     0.07    N
    1                     0.01    N
    2                     0.09    Y
    

    时间安排

    #3000 rows
    ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
    #print (ES_15M_Summary)
    
    
    def loop(df):
        for index, row in ES_15M_Summary.iterrows():
            if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
                ES_15M_Summary.loc[index,'Long'] = 'Y'
            else:
                ES_15M_Summary.loc[index,'Long'] = 'N'
        return (ES_15M_Summary)
    
    print (loop(ES_15M_Summary))
    
    
    In [51]: %timeit (loop(ES_15M_Summary))
    1 loop, best of 3: 2.38 s per loop
    
    In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
    1000 loops, best of 3: 555 µs per loop
    

    【讨论】:

    • 谢谢,我正在使用您提供的 for 循环。非常感谢。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-09-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多