熊猫将系列视图分配给系列视图不起作用？答案

【问题标题】：pandas assigning series view to a series view doesn't work?熊猫将系列视图分配给系列视图不起作用？
【发布时间】：2013-05-29 03:09:21
【问题描述】：

我正在尝试从系列中获取切片视图（由条件逻辑索引），对其进行处理，然后将结果分配回该逻辑索引切片。分配中的 LHS 和 RHS 是具有匹配索引的系列，但由于某种未知原因，分配最终成为标量（见底部）。如何获得所需的分配？（我检查了 SO 和 pandas 0.11.0 doc 是否有任何相关信息）。

import numpy as np
import pandas as pd

# A dataframe with sample data and some boolean conditional
df = pd.DataFrame(data={'x': range(1,20)})
df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )

# Create a new col and selectively assign to it... elsewhere being NaN...
df['newcol'] = np.nan
# This attempted assign to a view of the df doesn't work (in reality the RHS expression would actually be a return value from somefunc)
df.ix[df.cond, df.columns.get_loc('newcol')] = 2* df.ix[df.cond, df.columns.get_loc('x')]
# yet a scalar assign does...
df.ix[df.cond, df.columns.get_loc('newcol')] = 99.
# Likewise bad trying to use -df.cond as the logical index:
df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]

目前我只是得到一个愚蠢的标量分配：

>>> df.ix[-df.cond, df.columns.get_loc('newcol')] = 2* df.ix[-df.cond, df.columns.get_loc('x')]
>>> df
     x   cond  newcol
0    1   True     NaN
1    2  False       4
2    3  False       4
3    4   True     NaN
4    5  False       4
5    6  False       4
6    7   True     NaN
7    8  False       4
8    9  False       4
9   10   True     NaN
10  11  False       4
11  12  False       4
12  13   True     NaN
13  14  False       4
14  15  False       4
15  16   True     NaN
16  17  False       4
17  18  False       4
18  19   True     NaN

【问题讨论】：

标签： python pandas dataframe slice series

【解决方案1】：

In [21]: df = pd.DataFrame(data={'x': range(1,20)})

In [22]: df['cond'] = df.x.apply(lambda xx: ((xx%3)==1) )

In [23]: df
Out[23]: 
     x   cond
0    1   True
1    2  False
2    3  False
3    4   True
4    5  False
5    6  False
6    7   True
7    8  False
8    9  False
9   10   True
10  11  False
11  12  False
12  13   True
13  14  False
14  15  False
15  16   True
16  17  False
17  18  False
18  19   True

In [24]: df['newcol'] = 2*df.loc[df.cond, 'x']

In [25]: df
Out[25]: 
     x   cond  newcol
0    1   True       2
1    2  False     NaN
2    3  False     NaN
3    4   True       8
4    5  False     NaN
5    6  False     NaN
6    7   True      14
7    8  False     NaN
8    9  False     NaN
9   10   True      20
10  11  False     NaN
11  12  False     NaN
12  13   True      26
13  14  False     NaN
14  15  False     NaN
15  16   True      32
16  17  False     NaN
17  18  False     NaN
18  19   True      38


In [10]: def myfunc(df_):
   ....:     return 2 * df_
   ....: 

 In [26]: df['newcol'] = myfunc(df.ix[df.cond, df.columns.get_loc('newcol')])

In [27]: df
Out[27]: 
     x   cond  newcol
0    1   True       4
1    2  False     NaN
2    3  False     NaN
3    4   True      16
4    5  False     NaN
5    6  False     NaN
6    7   True      28
7    8  False     NaN
8    9  False     NaN
9   10   True      40
10  11  False     NaN
11  12  False     NaN
12  13   True      52
13  14  False     NaN
14  15  False     NaN
15  16   True      64
16  17  False     NaN
17  18  False     NaN
18  19   True      76

【讨论】：

【解决方案2】：

我找到了这个解决方法：

tmp = pd.Series(np.repeat(np.nan, len(df)))
tmp[-cond] = 2* df.loc[df.cond, 'x']
df['newcol'] = tmp

奇怪的是，以下有时有效（将切片分配给整个系列）（但使用AssertionError: Length of values does not match length of index 更复杂的 RHS 失败）

（根据 pandas 文档，RHS 系列索引应该与 LHS 对齐，至少如果 LHS 是一个数据框 - 但如果它是一个系列则不是？这是一个错误吗？）

>>> df['newcol'] = 2* df.loc[df.cond, 'x']
>>> df
     x   cond  newcol
0    1   True       2
1    2  False     NaN
2    3  False     NaN
3    4   True       8
4    5  False     NaN
5    6  False     NaN
6    7   True      14
7    8  False     NaN
8    9  False     NaN
9   10   True      20
10  11  False     NaN
11  12  False     NaN
12  13   True      26
13  14  False     NaN
14  15  False     NaN
15  16   True      32
16  17  False     NaN
17  18  False     NaN
18  19   True      38

Jeff，奇怪的是我们可以分配给 df['newcol'] （应该是副本而不是视图，对吗？）当我们这样做时：

df['newcol'] = 2* df.loc[df.cond, 'x']

但当我们对来自 fn 的 RHS 执行相同操作时则不然：

def myfunc(df_):
    """Some func transforming and returning said Series slice"""
    return 2* df_

df['newcol'] = myfunc( df.ix[df.cond, df.columns.get_loc('newcol')] )

【讨论】：

这是正确的；你在得到断言错误的rhs中使用了什么？在列表/numpy 数组的情况下，它必须与索引对齐或长度相等
杰夫 - 见底部附录
这一切都有效；你能显示你得到的实际输出吗？ df['newcol'] 是一个系列，其数据在这种情况下是一个视图（作为它的浮动数据），因此修改会影响框架 - 但情况并非总是如此，例如假设你有一个 int dtype 然后你分配一个 np.nan 给它，然后你会修改一个副本，然后分配回框架
仅供参考，不需要df.columns.get_loc('newcol')，只需使用'newcol'