查找数据框中的行，这些行在另一个数据框中不可用，不包括一列值答案

【问题标题】：Find rows in dataframe which are aot available in another dataframe excludind one column values查找数据框中的行，这些行在另一个数据框中不可用，不包括一列值
【发布时间】：2021-01-08 10:44:47
【问题描述】：

我有 2 个数据框，具有相同的列名：

#df1
col_1    col_2    col_3
1        10        100
2        20        40
3        30        50

#df2
col_1    col_2    col_3
5        10        200
3        20        500
3        30        700

我想仅基于 col_1 和 col_2 比较这 2 个数据帧，并在 df1 中找到行与 col_1 和 col_2 上的值，而这些值在 df2 中不存在

这是上述示例的所需输出：

#df
col_1    col_2    col_3
3        30        50

我尝试了这段代码，但它比较了整行，我只想比较 col_1 和 col_2：

df = df1.merge(df2, how = 'outer',indicator=True).loc[lambda x : x['_merge']=='left_only']

【问题讨论】：

"我尝试了这段代码，但它比较了整行，我只想比较 col_1 和 col_2" 然后只将这些列合并：df1.merge(df2, on = ['col_1','col_2'], how = 'outer',indicator=True).loc....跨度>

标签： python pandas dataframe

【解决方案1】：

col_1、col_2 对在两个 DataFrame 中是否唯一？

你可以这样做：

join_cols = ['col_1', 'col_2']
merged = df1[join_cols].join(df2.set_index(join_cols), on=join_cols)
not_in_df2 = merged.col_3.isnull()

如果连接索引是唯一的，则 not_in_df1 将与 df2 对齐。否则，您可以执行以下操作，这两种情况都适用

not_in_df2 = merged.index[not_in_df1].unique()

最后，

df1.loc[not_in_df2]

编辑：

也许是更好的方法：

index = pd.MultiIndex.from_arrays([df2.col_1, df2.col_2]).unique()
not_in_df2 = index.get_indexer([df1.col_1, df1.col_2]) == -1

【讨论】：

【解决方案2】：

如果你想使用 Pandas 的函数 merge 我建议下一个解决方案：

df = df1.merge(df2, how='inner', on = ['col_1', 'col_2'], suffixes=('', 'b'))
df = df.drop(df.columns.difference(df1.columns), axis = 1)

如果没有，这里有一个更简单的解决方案：

df = df1.loc[np.where((df1["col_1"] == df2["col_1"])&(df1["col_2"] == df2["col_2"]), True, False)]

如果您还想重置结果数据框中的行号，您可以简单地使用：

df = df.reset_index(drop = True)

【讨论】：