如果该子列表的任何元素在另一个列表中，如何删除列表（即子列表）中的列表？答案

【问题标题】：how to delete a list within a list (i.e., a sublist) if any element of that sublist is in another list?如果该子列表的任何元素在另一个列表中，如何删除列表（即子列表）中的列表？
【发布时间】：2013-09-05 20:24:20
【问题描述】：

我有一个包含许多子列表的列表。例如：

full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]

我还有另一个列表，称为省略。例如：

omit = [99, 60, 98]

如果该子列表中的 any 元素在省略列表中，我想删除 full_list 中的子列表。例如，我希望结果列表是：

reduced_list = [[1, 1, 3, 4], [2, 4, 4]]

因为只有这些子列表没有省略列表中的元素。

我猜有一些简单的方法可以通过列表理解来解决这个问题，但我无法让它发挥作用。我尝试了很多东西：例如：

reduced_list = [sublist for sublist in full_list if item for sublist not in omit]

此代码导致错误（无效的 snytax） - 但我认为我遗漏的不止这些。

任何帮助将不胜感激！

p.s.，上面是一个简化的问题。我的最终目标是从非常长的字符串列表（例如，500,000 个子列表）中删除子列表，如果这些子列表的任何元素（字符串）在“省略”列表中包含超过 2000 个字符串。

【问题讨论】：

你们太棒了！感谢您的回复。在较长的列表中，它就像一个魅力。

标签： python performance list sublist

【解决方案1】：

使用set 和all()：

>>> omit = {99, 60, 98}
>>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
>>> [item for item in full_list if all(x not in omit for x in item)]
[[1, 1, 3, 4], [2, 4, 4]]

此方法与@alecxe（或@Óscar López）解决方案的主要区别在于它all 短路并且不会在内存中创建任何集合或列表，而 set-intersection 返回一个包含所有与omit set 共同的项目，并检查其长度以确定是否有任何项目是共同的。（set-intersection 在内部以 C 速度发生，因此它比all 中使用的普通 python 循环更快）

时序对比：

>>> import random

没有项目相交：

>>> omit = set(random.randrange(1, 10**18) for _ in xrange(100000))
>>> full_list = [[random.randrange(10**19, 10**100) for _ in xrange(100)] for _ in xrange(1000)]

>>> %timeit [item for item in full_list if not omit & set(item)]
10 loops, best of 3: 43.3 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
10 loops, best of 3: 28 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
10 loops, best of 3: 65.3 ms per loop

所有项目相交：

>>> full_list = [range(10**3) for _ in xrange(1000)]
>>> omit = set(xrange(10**3))
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 148 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 108 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
100 loops, best of 3: 1.62 ms per loop

一些项目相交：

>>> omit = set(xrange(1000, 10000))
>>> full_list = [range(2000) for _ in xrange(1000)]
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 282 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 159 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
1 loops, best of 3: 227 ms per loop

【讨论】：

既然已经在使用集合，那么使用原生集合操作会更好更快。
@Anorov 我可以，但是在内存中创建一个集合只是为了检查它的长度是没用的。

【解决方案2】：

试试这个：

full_list = [[1, 1, 3, 4], [3, 99, 5, 2], [2, 4, 4], [3, 4, 5, 2, 60]]
omit = frozenset([99, 60, 98])
reduced_list = [x for x in full_list if not omit.intersection(x)]

我对输入数据所做的唯一更改是 omit 现在是一个集合，出于效率原因，因为它允许我们执行快速交叉（它被冻结，因为我们不打算修改它），请注意x 不必是一个集合。现在reduced_list 变量将包含预期值：

reduced_list
=> [[1, 1, 3, 4], [2, 4, 4]]

【讨论】：

最好将 omit 作为一个集合，以便为循环的每次迭代转换它。
@Anorov 在你前面 :)
对于frozenset +1，它有点快（检查我的答案中的minibenchmark）

【解决方案3】：

将omit 设为一个集合，检查每一步迭代的交集：

>>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
>>> omit = [99, 60, 98]
>>> omit = set(omit)  # or just omit = {99, 60, 98} for python >= 2.7
>>> [item for item in full_list if not omit & set(item)]
[[1, 1, 3, 4], [2, 4, 4]]

仅供参考，最好使用frozenset 而不是@Óscar López 建议的集合。使用frozenset，它的运行速度会更快：

import timeit


def omit_it(full_list, omit):
    return [item for item in full_list if not omit & set(item)]

print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], {99, 60, 98})',
                   'from __main__ import omit_it').timeit(10000)

print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], frozenset([99, 60, 98]))',
                   'from __main__ import omit_it').timeit(10000)

打印：

0.0334849357605
0.0319349765778

【讨论】：

我打算使用any() 发布解决方案，但我认为这更好，更 Pythonic。
请注意omit & set(item)不会短路，会在内存中创建一个新集合。
@AshwiniChaudhary 是的，看看什么更快真的很有趣..当然取决于输入。
@AshwiniChaudhary 这是一个很好的观点，但由于实现set.intersection() 可能会更快，即使考虑到来自all() 的短路。看到一个基准会很有趣。
@alecxe 顺便说一句，您可以使用：not omit.intersection(item)，这将比 not omit & set(item) 更快。