Pandas：使用同一列中的值按条件更改列中的值答案

【问题标题】：Pandas: hange value in column by condition with value from the same columnPandas：使用同一列中的值按条件更改列中的值
【发布时间】：2020-07-19 21:56:57
【问题描述】：

我需要将一个名为 'month' 的列中的值替换为基于另一列 'step_name' 的同一列中的值。如果df.step_name.str.contains('step1') 我想使用'month' 的值，其中df.step_name.str.contains('step2')。我使用了df.loc[]，但它只是删除了month 和'step1' 的值。

for i in set(df['id']): df.loc[(df.id.str.contains(i))&(df.step_name.str.contains('step1')),'month'] = df.loc[(df.id.str.contains(i))&(df.step_name.str.contains('step2')),'month']

【问题讨论】：

标签： python python-3.x pandas dataframe contains

【解决方案1】：

假设源DataFrame包含：

   id step_name     month
0  10     step1   January
1  10     step2     March
2  12     step1  February
3  12     step2     April
4  14     step1       May

以便在索引为 0 和 2 的行中 (step_name == 'step1') month 列应使用下一行的值进行更新（step_name == 'step2'，相同的 id）。

为此，运行：

df.set_index('id', inplace=True)
df.update(df[df.step_name == 'step2']['month'])
df.reset_index(inplace=True)

结果是：

   id step_name  month
0  10     step1  March
1  10     step2  March
2  12     step1  April
3  12     step2  April
4  14     step1    May

注意 update 实际上用各自的 id 更新两行，但是对于带有 step_name == 'step2' 的行，什么都不会改变。

在我看来，我的解决方案比你的循环更 pandasonic 每个 id 的单独更新。

【讨论】：

【解决方案2】：

我看到你在那里做了什么！这是微妙的，是一个很好的错误。

首先，我将进行快速清理，以便我们了解发生了什么：

# Your code.
is_step1 = new_df.step_name.str.contains('step1')
is_step2 = new_df.step_name.str.contains('step2')

for i in set(df['id']): 
  is_id = df.id.str.contains(i)
  df.loc[is_id & is_step1, 'month'] = df.loc[is_id & is_step2, 'month']

您使用两个相互影响的面具。

'''
mask1 mask2  => df[mask1] df[mask2]
1     0         value1    NaN        -> value1 = NaN
0     1         NaN       value2
0     0         NaN       NaN
0     0         NaN       NaN
'''

如果您改用数组，pandas 会将数组映射到要在赋值左侧填充的值...

new_df.loc[is_id & is_step1, 'month'] = new_df.loc[is_id & is_step2, 'month'].values

...这就是发生的事情：

'''
mask1 mask2  => df[mask1] df[mask2].values
1     0         value1    value2            -> value1 = value2
0     1         NaN       
0     0         NaN       
0     0         NaN       
'''

现在，例如，如果您想交换 step1 和 step2 的月份...

# N.B. I don't say it is best practice, but it works!
new_df = df.sort_values('id')

is_step1 = new_df.step_name.str.contains('step1')
is_step2 = new_df.step_name.str.contains('step2')

c = df.loc[is_step1, 'month'].values
new_df.loc[is_step1, 'month'] = new_df.loc[is_step2, 'month'].values
new_df.loc[is_step2, 'month'] = c

我相信 Valdi_Bo 的解决方案是最好的。接受他的回答。 ;)

【讨论】：