【发布时间】:2021-07-19 10:19:56
【问题描述】:
我正在尝试根据两个条件添加两个具有不同值的新列。
左右DataFrames的源样本数据
| id | rec_type | end_date |
|---|---|---|
| 13759 | U | 20210113 |
| 23806 | N | NaN |
| 21347 | U | 20210113 |
| 36904 | N | NaN |
| id |
|---|
| 23806 |
| 21347 |
预期输出:
| id | rec_type | end_date | _merge | error_code | error_description |
|---|---|---|---|---|---|
| 13759 | U | 20210113 | left_only | 601 | update record not available in right table |
| 23806 | N | NaN | both | 0 | 0 |
| 21347 | U | 20210113 | both | 0 | 0 |
| 36904 | N | NaN | left_only | 602 | New record not available in right table |
我正在使用 numpy (np) select 来实现我的要求,如下面的代码所示,但出现错误。
import pandas as pd
import numpy as np
merged_df = pd.merge(left_df, right_df,
how='outer',
on=['id'],
indicator=True)
merged_df = merged_df.query('_merge != "right_only"')
conditions = [((merged_df['_merge'] == "left_only") &
(merged_df['rec_type'] == "U") &
(merged_df['end_date'].notnull())),
((merged_df['_merge'] == "left_only") &
(merged_df['rec_type'] == "N") &
(merged_df['end_date'].isnull()))]
error_codes = dict()
error_codes['error_code'] = [601, 602]
error_codes['error_description'] = ['update record not available in right table',
'New record not available in right table']
merged_df['error_code'] = np.select(conditions, error_codes['error_code'])
merged_df['error_description'] = np.select(conditions, error_codes['error_description'])
我遇到以下错误,请分享解决该错误的建议。
SettingWithCopyWarning:试图在一个副本上设置一个值 从 DataFrame 切片。尝试使用 .loc[row_indexer,col_indexer] = 取而代之的价值
请参阅文档中的注意事项: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy validate_df['error_code'] = np.select(条件, error_codes['error_code'])
SettingWithCopyWarning:试图在一个副本上设置一个值 从 DataFrame 切片。尝试使用 .loc[row_indexer,col_indexer] = 取而代之的价值
请参阅文档中的注意事项: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy validate_df['error_description'] = np.select(条件, error_codes['error_description'])
谢谢,
拉古纳特。
注意:代码在样本数据上运行良好,但在数据更多时,会出现错误
【问题讨论】:
-
尝试添加.copy():merged_df = merge_df.query('_merge != "right_only"').copy()
标签: python-3.x pandas dataframe numpy