【问题标题】:Pandas For Loop error - with and/if statements embeddedPandas For Loop 错误 - 嵌入了和/if语句
【发布时间】:2017-07-17 06:56:01
【问题描述】:

我有一个时间序列pandas.DataFrame,'ES_Summary_Index1',如下:

     Ticker_x                Date  Close_x 15M_Long 1H_Long Net_Long
0       ES H7 2016-10-18 13:44:59  2128.00        N     NaN         
1       ES H7 2016-10-18 13:59:59  2128.75        N     NaN         
2       ES H7 2016-10-18 14:14:59  2125.75        N     NaN         
3       ES H7 2016-10-18 14:29:59  2126.50        N       N         
4       ES H7 2016-10-18 14:44:59  2126.50        N     NaN         
5       ES H7 2016-10-18 16:14:59  2126.00        N     NaN         
6       ES H7 2016-10-18 16:44:59  2126.25        N     NaN         
7       ES H7 2016-10-18 17:59:59  2126.50        N     NaN         
8       ES H7 2016-10-18 18:14:59  2127.00        N     NaN         
9       ES H7 2016-10-18 19:14:59  2127.75        N     NaN         
10      ES H7 2016-10-18 19:44:59  2127.75        N     NaN         
11      ES H7 2016-10-18 19:59:59  2127.75        N     NaN         
12      ES H7 2016-10-18 20:44:59  2129.00        N     NaN         
13      ES H7 2016-10-18 21:29:59  2128.75        N       N         
14      ES H7 2016-10-18 21:44:59  2129.00        N     NaN                 

关注15M_Long1H_Long 列,如果两者都说“Y”,我希望Net_Long 列也说Long。如果只有一个或都不说“Y”,那么我希望 Net_Long 列保持空白或说“N”(无论哪个)。

首先,我将 Net_Long 列设置为空白:

ES_Summary_Index1['Net_Long'] = ''

接下来,我正在编写一个 for 循环语句来填充 Net_Long 列:

for index, row in ES_Summary_Index1.iterrows():
    if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
        ES_Summary_Index1.loc['Net_Long'] = 'Long'
    else:
        ES_Summary_Index1.loc['Net_Long'] = 'N'

很遗憾,我收到以下错误:

TypeError: unsupported operand type(s) for &: 'str' and 'float'

...参考上面的 if 语句(if ES_Summary_Index1...)。我已经尝试从& 更改为and,但这并没有像我想要的那样填充Net_Long 列。我也尝试过 == 而不是 is ,但这不起作用。有人可以帮忙吗?

【问题讨论】:

  • iterrows 解决方案很慢,请用numpy.where查看另一个解决方案

标签: python pandas dataframe time-series


【解决方案1】:

您需要使用布尔掩码非常快速地矢量化numpy.where

mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')

print (df)
  Ticker_x                 Date  Close_x 15M_Long 1H_Long Net_Long
0    ES_H7  2016-10-18T13:44:59  2128.00        N     NaN        N
1    ES_H7  2016-10-18T13:59:59  2128.75        N     NaN        N
2    ES_H7  2016-10-18T19:59:59  2127.75        Y     NaN        N
3    ES_H7  2016-10-18T20:44:59  2129.00        N       Y        N
4    ES_H7  2016-10-18T21:29:59  2128.75        Y       Y     Long
5    ES_H7  2016-10-18T21:44:59  2129.00        N     NaN        N

时间安排

#length of df is 600 rows
In [183]: %timeit (iterate(df))
10 loops, best of 3: 67.1 ms per loop

In [184]: %timeit (vectorize(df1))
1000 loops, best of 3: 1.49 ms per loop

#length of df is 6000 rows
In [177]: %timeit (iterate(df))
1 loop, best of 3: 681 ms per loop

In [178]: %timeit (vectorize(df1))
100 loops, best of 3: 3.23 ms per loop

#length of df is 60000 rows 
In [180]: %timeit (iterate(df))
1 loop, best of 3: 6.87 s per loop

In [181]: %timeit (vectorize(df1))
10 loops, best of 3: 20.8 ms per loop

计时码

data = [x.strip().split() for x in """
    Ticker_x             Date  Close_x 15M_Long 1H_Long
    ES_H7 2016-10-18T13:44:59  2128.00        N     NaN
    ES_H7 2016-10-18T13:59:59  2128.75        N     NaN
    ES_H7 2016-10-18T19:59:59  2127.75        Y     NaN
    ES_H7 2016-10-18T20:44:59  2129.00        N       Y
    ES_H7 2016-10-18T21:29:59  2128.75        Y       Y
    ES_H7 2016-10-18T21:44:59  2129.00        N     NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
#for 600 rows * 100, 6000 rows *1000, 60k * 10000
df = pd.concat([df]*1000).reset_index(drop=True)
print (df)
df1 = df.copy()

def vectorize(df):
    mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
    df['Net_Long'] = np.where(mask, 'Long', 'N')
    return (df)

def iterate(df):
    df['Net_Long'] = ''

    for _, row in df.iterrows():
        if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
            row['Net_Long'] = 'Long'
        else:
            row['Net_Long'] = 'N'
    return df

print (iterate(df)) 
print (vectorize(df1))

【讨论】:

    【解决方案2】:

    除了获得正确的逻辑测试外,您还应该在循环中直接访问该行。您当前的代码每次通过循环都会重置整个列:

    代码:

    df['Net_Long'] = ''
    
    for _, row in df.iterrows():
        if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
            row['Net_Long'] = 'Long'
        else:
            row['Net_Long'] = 'N'
    

    测试数据:

    import pandas as pd
    
    data = [x.strip().split() for x in """
        Ticker_x             Date  Close_x 15M_Long 1H_Long
        ES_H7 2016-10-18T13:44:59  2128.00        N     NaN
        ES_H7 2016-10-18T13:59:59  2128.75        N     NaN
        ES_H7 2016-10-18T19:59:59  2127.75        Y     NaN
        ES_H7 2016-10-18T20:44:59  2129.00        N       Y
        ES_H7 2016-10-18T21:29:59  2128.75        Y       Y
        ES_H7 2016-10-18T21:44:59  2129.00        N     NaN
    """.split('\n')[1:-1]]
    df = pd.DataFrame(data=data[1:], columns=data[0])
    

    生产:

      Ticker_x                 Date  Close_x 15M_Long 1H_Long Net_Long
    0    ES_H7  2016-10-18T13:44:59  2128.00        N     NaN        N
    1    ES_H7  2016-10-18T13:59:59  2128.75        N     NaN        N
    2    ES_H7  2016-10-18T19:59:59  2127.75        Y     NaN        N
    3    ES_H7  2016-10-18T20:44:59  2129.00        N       Y        N
    4    ES_H7  2016-10-18T21:29:59  2128.75        Y       Y     Long
    5    ES_H7  2016-10-18T21:44:59  2129.00        N     NaN        N
    

    【讨论】:

      【解决方案3】:

      替换以下行:

      if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
      

      if ES_Summary_Index1.loc[index, '15M_Long']=='Y' and ES_Summary_Index1.loc[index, '1H_Long']=='Y':
      

      【讨论】:

      • 感谢您的帮助,我会在几分钟后接受答案!
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-04-16
      • 2016-04-08
      • 2018-04-07
      • 2013-09-02
      • 2021-07-31
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多