【问题标题】:delete duplicated rows based on conditions pandas根据条件删除重复的行 pandas
【发布时间】:2020-03-13 19:30:10
【问题描述】:

如果不同行之间的 (x1, x2, x3) 相同,我想删除数据框中的行,并将所有已删除行的 id 保存在变量中。

例如,有了这个数据,我想删除第二行;

d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

【问题讨论】:

  • df.drop_duplicates(['x2','x3','x4'])?
  • 但是如何保存已删除行的 ID?
  • deleted_ids = df.loc[df.duplicated(['x2','x3','x4']), 'id']?
  • @QuangHoang 不,不,谢谢,但重复项是 x1&x2&x3。请看我的例子
  • 然后传入['x1','x2','x3']而不是['x2','x3','x4']?

标签: python-3.x pandas dataframe


【解决方案1】:
#input data
d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

#create new column where contents from x1, x2 and x3 columns are merged
df['MergedColumn'] = df[df.columns[1:4]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)

#remove duplicates based on the created column and drop created column
df1 = pd.DataFrame(df.drop_duplicates("MergedColumn", keep='first').drop(columns="MergedColumn"))

#print output dataframe
print(df1)

#merge two dataframes
df2 = pd.merge(df, df1,  how='left', on = 'id')
#find rows with null values in the right table (rows that were removed)
df2 = df2[df2['x1_y'].isnull()]

#prints ids of rows that were removed
print(df2['id'])

【讨论】:

    猜你喜欢
    • 2021-04-02
    • 2022-01-14
    • 1970-01-01
    • 2018-06-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-20
    相关资源
    最近更新 更多