Pandas For Loop 错误 - 嵌入了和/if语句答案

【问题标题】：Pandas For Loop error - with and/if statements embeddedPandas For Loop 错误 - 嵌入了和/if语句
【发布时间】：2017-07-17 06:56:01
【问题描述】：

我有一个时间序列pandas.DataFrame，'ES_Summary_Index1'，如下：

     Ticker_x                Date  Close_x 15M_Long 1H_Long Net_Long
0       ES H7 2016-10-18 13:44:59  2128.00        N     NaN         
1       ES H7 2016-10-18 13:59:59  2128.75        N     NaN         
2       ES H7 2016-10-18 14:14:59  2125.75        N     NaN         
3       ES H7 2016-10-18 14:29:59  2126.50        N       N         
4       ES H7 2016-10-18 14:44:59  2126.50        N     NaN         
5       ES H7 2016-10-18 16:14:59  2126.00        N     NaN         
6       ES H7 2016-10-18 16:44:59  2126.25        N     NaN         
7       ES H7 2016-10-18 17:59:59  2126.50        N     NaN         
8       ES H7 2016-10-18 18:14:59  2127.00        N     NaN         
9       ES H7 2016-10-18 19:14:59  2127.75        N     NaN         
10      ES H7 2016-10-18 19:44:59  2127.75        N     NaN         
11      ES H7 2016-10-18 19:59:59  2127.75        N     NaN         
12      ES H7 2016-10-18 20:44:59  2129.00        N     NaN         
13      ES H7 2016-10-18 21:29:59  2128.75        N       N         
14      ES H7 2016-10-18 21:44:59  2129.00        N     NaN

关注15M_Long 和1H_Long 列，如果两者都说“Y”，我希望Net_Long 列也说Long。如果只有一个或都不说“Y”，那么我希望 Net_Long 列保持空白或说“N”（无论哪个）。

首先，我将 Net_Long 列设置为空白：

ES_Summary_Index1['Net_Long'] = ''

接下来，我正在编写一个 for 循环语句来填充 Net_Long 列：

for index, row in ES_Summary_Index1.iterrows():
    if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
        ES_Summary_Index1.loc['Net_Long'] = 'Long'
    else:
        ES_Summary_Index1.loc['Net_Long'] = 'N'

很遗憾，我收到以下错误：

TypeError: unsupported operand type(s) for &: 'str' and 'float'

...参考上面的 if 语句（if ES_Summary_Index1...）。我已经尝试从& 更改为and，但这并没有像我想要的那样填充Net_Long 列。我也尝试过 == 而不是 is ，但这不起作用。有人可以帮忙吗？

【问题讨论】：

iterrows 解决方案很慢，请用numpy.where查看另一个解决方案

标签： python pandas dataframe time-series

【解决方案1】：

您需要使用布尔掩码非常快速地矢量化numpy.where：

mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')

print (df)
  Ticker_x                 Date  Close_x 15M_Long 1H_Long Net_Long
0    ES_H7  2016-10-18T13:44:59  2128.00        N     NaN        N
1    ES_H7  2016-10-18T13:59:59  2128.75        N     NaN        N
2    ES_H7  2016-10-18T19:59:59  2127.75        Y     NaN        N
3    ES_H7  2016-10-18T20:44:59  2129.00        N       Y        N
4    ES_H7  2016-10-18T21:29:59  2128.75        Y       Y     Long
5    ES_H7  2016-10-18T21:44:59  2129.00        N     NaN        N

时间安排：

#length of df is 600 rows
In [183]: %timeit (iterate(df))
10 loops, best of 3: 67.1 ms per loop

In [184]: %timeit (vectorize(df1))
1000 loops, best of 3: 1.49 ms per loop

#length of df is 6000 rows
In [177]: %timeit (iterate(df))
1 loop, best of 3: 681 ms per loop

In [178]: %timeit (vectorize(df1))
100 loops, best of 3: 3.23 ms per loop

#length of df is 60000 rows 
In [180]: %timeit (iterate(df))
1 loop, best of 3: 6.87 s per loop

In [181]: %timeit (vectorize(df1))
10 loops, best of 3: 20.8 ms per loop

计时码：

data = [x.strip().split() for x in """
    Ticker_x             Date  Close_x 15M_Long 1H_Long
    ES_H7 2016-10-18T13:44:59  2128.00        N     NaN
    ES_H7 2016-10-18T13:59:59  2128.75        N     NaN
    ES_H7 2016-10-18T19:59:59  2127.75        Y     NaN
    ES_H7 2016-10-18T20:44:59  2129.00        N       Y
    ES_H7 2016-10-18T21:29:59  2128.75        Y       Y
    ES_H7 2016-10-18T21:44:59  2129.00        N     NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
#for 600 rows * 100, 6000 rows *1000, 60k * 10000
df = pd.concat([df]*1000).reset_index(drop=True)
print (df)
df1 = df.copy()

def vectorize(df):
    mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
    df['Net_Long'] = np.where(mask, 'Long', 'N')
    return (df)

def iterate(df):
    df['Net_Long'] = ''

    for _, row in df.iterrows():
        if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
            row['Net_Long'] = 'Long'
        else:
            row['Net_Long'] = 'N'
    return df

print (iterate(df)) 
print (vectorize(df1))

【讨论】：

【解决方案2】：

除了获得正确的逻辑测试外，您还应该在循环中直接访问该行。您当前的代码每次通过循环都会重置整个列：

代码：

df['Net_Long'] = ''

for _, row in df.iterrows():
    if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
        row['Net_Long'] = 'Long'
    else:
        row['Net_Long'] = 'N'

测试数据：

import pandas as pd

data = [x.strip().split() for x in """
    Ticker_x             Date  Close_x 15M_Long 1H_Long
    ES_H7 2016-10-18T13:44:59  2128.00        N     NaN
    ES_H7 2016-10-18T13:59:59  2128.75        N     NaN
    ES_H7 2016-10-18T19:59:59  2127.75        Y     NaN
    ES_H7 2016-10-18T20:44:59  2129.00        N       Y
    ES_H7 2016-10-18T21:29:59  2128.75        Y       Y
    ES_H7 2016-10-18T21:44:59  2129.00        N     NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])

生产：

  Ticker_x                 Date  Close_x 15M_Long 1H_Long Net_Long
0    ES_H7  2016-10-18T13:44:59  2128.00        N     NaN        N
1    ES_H7  2016-10-18T13:59:59  2128.75        N     NaN        N
2    ES_H7  2016-10-18T19:59:59  2127.75        Y     NaN        N
3    ES_H7  2016-10-18T20:44:59  2129.00        N       Y        N
4    ES_H7  2016-10-18T21:29:59  2128.75        Y       Y     Long
5    ES_H7  2016-10-18T21:44:59  2129.00        N     NaN        N

【讨论】：

【解决方案3】：

替换以下行：

if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':

与

if ES_Summary_Index1.loc[index, '15M_Long']=='Y' and ES_Summary_Index1.loc[index, '1H_Long']=='Y':

【讨论】：

感谢您的帮助，我会在几分钟后接受答案！