如果你打开一个python解释器,你会发现"doc" and "pdf" and "xls" and "jpg"和'jpg'是一回事:
>>> "doc" and "pdf" and "xls" and "jpg"
'jpg'
因此,您的第一次尝试仅针对“jpg”进行测试,而不是针对所有字符串进行测试。
有很多方法可以做你想做的事。以下不是最明显的,但很有用:
if not any(test_string in text for test_string in ["doc", "pdf", "xls", "jpg"]):
filtered.append(text)
另一种方法是将for 循环与else 语句结合使用:
for test_string in ["doc", "pdf", "xls", "jpg"]:
if test_string in text:
break
else:
filtered.append(text)
最后,您可以使用纯列表推导:
tofilter = ["one.pdf", "two.txt", "three.jpg", "four.png"]
test_strings = ["doc", "pdf", "xls", "jpg"]
filtered = [s for s in tofilter if not any(t in s for t in test_strings)]
编辑:
如果您想同时过滤单词和扩展名,我建议如下:
text_list = generate_text_list() # or whatever you do to get a text sequence
extensions = ['.doc', '.pdf', '.xls', '.jpg']
words = ['some', 'words', 'to', 'filter']
text_list = [text for text in text_list if not text.endswith(tuple(extensions))]
text_list = [text for text in text_list if not any(word in text for word in words)]
这仍然可能导致一些不匹配;上面还过滤了“做某事”、“他是个词匠”等。如果这是个问题,那么您可能需要更复杂的解决方案。