根据条件删除重复的行 pandas答案

【问题标题】：delete duplicated rows based on conditions pandas根据条件删除重复的行 pandas
【发布时间】：2020-03-13 19:30:10
【问题描述】：

如果不同行之间的 (x1, x2, x3) 相同，我想删除数据框中的行，并将所有已删除行的 id 保存在变量中。

例如，有了这个数据，我想删除第二行；

d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

【问题讨论】：

df.drop_duplicates(['x2','x3','x4'])?
但是如何保存已删除行的 ID？
deleted_ids = df.loc[df.duplicated(['x2','x3','x4']), 'id']?
@QuangHoang 不，不，谢谢，但重复项是 x1&x2&x3。请看我的例子
然后传入['x1','x2','x3']而不是['x2','x3','x4']?

标签： python-3.x pandas dataframe

【解决方案1】：

#input data
d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

#create new column where contents from x1, x2 and x3 columns are merged
df['MergedColumn'] = df[df.columns[1:4]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)

#remove duplicates based on the created column and drop created column
df1 = pd.DataFrame(df.drop_duplicates("MergedColumn", keep='first').drop(columns="MergedColumn"))

#print output dataframe
print(df1)

#merge two dataframes
df2 = pd.merge(df, df1,  how='left', on = 'id')
#find rows with null values in the right table (rows that were removed)
df2 = df2[df2['x1_y'].isnull()]

#prints ids of rows that were removed
print(df2['id'])

【讨论】：