【发布时间】:2019-01-11 10:43:01
【问题描述】:
import pandas as pd
dfa = {'account':['a','b','a','c','a'],
'ret_type':['CTR','WO','T','CTR','T'],
'val':['0.0','0.1','0.2','0.3','0.4'],
'ins_date':['11','12','11','13','14']}
df = pd.DataFrame(dfa)
account ret_type val ins_date
0 a CTR 0.0 11
1 b WO 0.1 12
2 a T 0.2 11
3 c CTR 0.3 13
4 a T 0.4 14
我有一个要求,我需要消除重复的行,这样
1 duplicate row means combination of (account,ins_dat)
2 if duplicate found i need to keep row with ret type CTR abd drop row with T
3 i dont want to delete T rows for which no duplicate row is there like 4
4 in this example fr ex 2nd row is deleted as output finally
我该怎么做?
【问题讨论】:
-
期望的输出是什么?把它贴出来,这样会很容易阅读。
-
如第 4 行所述,所需输出为 df 没有第 2 行 a T 0.2 11
-
基于
account, return_type,ins_dat的组合,您的示例中没有重复项。你能补充一些吗? -
对不起,仅基于帐户和ins_date
-
df.drop_duplicates(subset = ['account', 'ins_date'])?
标签: python python-3.x pandas dataframe data-science