【问题标题】:Add two columns to a pandas DataFrame based on condition根据条件将两列添加到 pandas DataFrame
【发布时间】:2021-07-19 10:19:56
【问题描述】:

我正在尝试根据两个条件添加两个具有不同值的新列。

左右DataFrames的源样本数据

id rec_type end_date
13759 U 20210113
23806 N NaN
21347 U 20210113
36904 N NaN
id
23806
21347

预期输出:

id rec_type end_date _merge error_code error_description
13759 U 20210113 left_only 601 update record not available in right table
23806 N NaN both 0 0
21347 U 20210113 both 0 0
36904 N NaN left_only 602 New record not available in right table

我正在使用 numpy (np) select 来实现我的要求,如下面的代码所示,但出现错误。

import pandas as pd
import numpy as np

merged_df = pd.merge(left_df, right_df,
                     how='outer',
                     on=['id'],
                    indicator=True)

merged_df = merged_df.query('_merge != "right_only"')

conditions = [((merged_df['_merge'] == "left_only") &
               (merged_df['rec_type'] == "U") &
               (merged_df['end_date'].notnull())),
              ((merged_df['_merge'] == "left_only") &
               (merged_df['rec_type'] == "N") &
               (merged_df['end_date'].isnull()))]

error_codes = dict()
error_codes['error_code'] = [601, 602]
error_codes['error_description'] = ['update record not available in right table',
                                    'New record not available in right table']
                                      
merged_df['error_code'] = np.select(conditions, error_codes['error_code'])
merged_df['error_description'] = np.select(conditions, error_codes['error_description'])

我遇到以下错误,请分享解决该错误的建议。

SettingWithCopyWarning:试图在一个副本上设置一个值 从 DataFrame 切片。尝试使用 .loc[row_indexer,col_indexer] = 取而代之的价值

请参阅文档中的注意事项: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy validate_df['error_code'] = np.select(条件, error_codes['error_code'])

SettingWithCopyWarning:试图在一个副本上设置一个值 从 DataFrame 切片。尝试使用 .loc[row_indexer,col_indexer] = 取而代之的价值

请参阅文档中的注意事项: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy validate_df['error_description'] = np.select(条件, error_codes['error_description'])

谢谢,

拉古纳特。

注意:代码在样本数据上运行良好,但在数据更多时,会出现错误

【问题讨论】:

  • 尝试添加.copy():merged_df = merge_df.query('_merge != "right_only"').copy()

标签: python-3.x pandas dataframe numpy


【解决方案1】:

我可以通过更改来解决问题

merged_df = merged_df.query('_merge != "right_only"')

到下面的代码

merged_df = merged_df[merged_df._merge != "right_only"]

【讨论】:

    猜你喜欢
    • 2018-09-10
    • 1970-01-01
    • 2016-02-03
    • 1970-01-01
    • 2019-11-02
    • 2018-08-13
    • 1970-01-01
    • 1970-01-01
    • 2020-11-07
    相关资源
    最近更新 更多