如何仅保留在特定列上共享相同值的数据框行答案

【问题标题】：How to only keep dataframe rows that shares same value on a specific column如何仅保留在特定列上共享相同值的数据框行
【发布时间】：2021-11-30 17:21:49
【问题描述】：

我有两个要比较的数据框，但首先我想断言第一列（我用作索引）对于两者都是相同的。

df1

    A   B   C   D   E
0   a   10  5   18  20
1   b   9   18  11  13
2   c   8   7   12  5
3   z   6   5   3   90

df2

    A   B   C   D   E
0   a   10  45  10  22
1   b   99  18  11  13
2   e   8   7   12  5
3   f   6   5   3   90

我只想保留 A 列中的值在两个数据框中的行。所以我希望 df1 和 df2 有这样的输出。

df3

    A   B   C   D   E
0   a   10  5   18  20
1   b   9   18  11  13

df4

    A   B   C   D   E
0   a   10  45  10  22
1   b   99  18  11  13

我还想检索已删除的行。

deleted_df

    A   B   C   D   E 
0   c   8   7   12  5
1   z   6   5   3   90
2   e   8   7   12  5
3   f   6   5   3   90

我现在已经尝试过了：

df3 = df1[df1['A'].isin(df2['A'])]
df4 = df2[df2['A'].isin(df1['A'])]

这似乎可行，但我不确定，但我仍然想检索 df3 和 df1（分别是 df4 和 df2）之间的区别

【问题讨论】：

标签： python python-3.x pandas

【解决方案1】：

您可以做的一件事是 outer 与传递 True 合并为 indicator：

>>> df1.merge(df2, on='A', indicator=True, how='outer', suffixes=('1',  '2',))
 
   A    B1    C1    D1    E1    B2    C2    D2    E2      _merge
0  a  10.0   5.0  18.0  20.0  10.0  45.0  10.0  22.0        both
1  b   9.0  18.0  11.0  13.0  99.0  18.0  11.0  13.0        both
2  c   8.0   7.0  12.0   5.0   NaN   NaN   NaN   NaN   left_only
3  z   6.0   5.0   3.0  90.0   NaN   NaN   NaN   NaN   left_only
4  e   NaN   NaN   NaN   NaN   8.0   7.0  12.0   5.0  right_only
5  f   NaN   NaN   NaN   NaN   6.0   5.0   3.0  90.0  right_only

这样，如果行是从两列或仅从左/右之一派生的，您将获得信息。

【讨论】：

【解决方案2】：

使用isin:

df1.loc[df1.A.isin(df2.A)]

   A   B   C   D   E
0  a  10   5  18  20
1  b   9  18  11  13

isin 返回一个用于过滤的布尔系列：

df1.A.isin(df2.A)
0     True
1     True
2    False
3    False
Name: A, dtype: bool

对于已删除的行：

df1 = df1.set_index('A')
df2 = df2.set_index('A')
deleted = df1.index.symmetric_difference(df2.index)
pd.concat([df1, df2]).loc[deleted]
   B  C   D   E
A              
c  8  7  12   5
e  8  7  12   5
f  6  5   3  90
z  6  5   3  90

【讨论】：