我看到已经有几个答案,但如果可以的话,我想加入我的答案。我认为SeaBean's solution 也是正确的,我喜欢它,与other proposed solution 不同,您不必指定哪些列具有列表。
我无法像 SeaBean 一样解释我的解决方案,因为这是我不久前遇到的问题,老实说,我通过反复试验想出了它。
d = {'A':['string1','string2','string3'],
'B':['string4','string5','string6'],
'C':[['string7','string10'],['string8'],['string9']],
'D':[['string11','string12'],['string13'],['string14']]
'E':[['string15'],['string16'],['string17']]}
df = pd.DataFrame(data=d);
res = df[~df.stack().explode().isin(to_delete).any(level=0)]
我已经尝试用更多的项目来计时,看来我的解决方案要好一些。
d = {'A':['string1','string2','string3']*10000, 'B':['string4','string5','string6']*10000, 'C':[['string7','string10'],['string8'],['string9']]*10000,'D':[['string11','string12'],['string13'],['string14']]*10000,'E':[['string15'],['string16'],['string17']]*10000}
df = pd.DataFrame(data=d);
to_delete = ['string7', 'string9', 'string17']
start = time.time()
print(df[~df.stack().explode().isin(to_delete).any(level=0)])
end = time.time()
print(end - start)
start = time.time()
print(df.loc[df.apply(pd.Series.explode).isin(to_delete).groupby(level=0).sum().sum(axis=1).eq(0)])
end = time.time()
print(end - start)
输出:
A B C D E
1 string2 string5 [string8] [string13] [string16]
4 string2 string5 [string8] [string13] [string16]
7 string2 string5 [string8] [string13] [string16]
10 string2 string5 [string8] [string13] [string16]
13 string2 string5 [string8] [string13] [string16]
... ... ... ... ... ...
29986 string2 string5 [string8] [string13] [string16]
29989 string2 string5 [string8] [string13] [string16]
29992 string2 string5 [string8] [string13] [string16]
29995 string2 string5 [string8] [string13] [string16]
29998 string2 string5 [string8] [string13] [string16]
[10000 rows x 5 columns]
0.08804535865783691
A B C D E
1 string2 string5 [string8] [string13] [string16]
4 string2 string5 [string8] [string13] [string16]
7 string2 string5 [string8] [string13] [string16]
10 string2 string5 [string8] [string13] [string16]
13 string2 string5 [string8] [string13] [string16]
... ... ... ... ... ...
29986 string2 string5 [string8] [string13] [string16]
29989 string2 string5 [string8] [string13] [string16]
29992 string2 string5 [string8] [string13] [string16]
29995 string2 string5 [string8] [string13] [string16]
29998 string2 string5 [string8] [string13] [string16]
[10000 rows x 5 columns]
0.11187744140625