在不导入库和使用集合的情况下删除列表中重复项的最快方法答案

【问题标题】：Fastest way to remove duplicates in a list without importing libraries and using sets在不导入库和使用集合的情况下删除列表中重复项的最快方法
【发布时间】：2020-04-18 13:38:18
【问题描述】：

我试图使用以下代码从列表中删除重复项：

a = [1,2,3,4,2,6,1,1,5,2]
res = []
[res.append(i) for i in a if i not in res]

但我想这样做而不将我想要的列表定义为空列表（即，省略行 res = []），例如：

a = [1,2,3,4,2,6,1,1,5,2]
#Either:
res = [i for i in a if i not in res]
#Or:
[i for i in a if i not in 'this list'] # this list is not a string. I meant it as the list being comprehensed

我想避免库导入和set()

【问题讨论】：

我相信你做不到，使用set(a) 删除重复，单行也很简单。如果顺序很重要，请使用字典或 OrderedDict，具体取决于您的 Python 版本，但这将是 hacky。
我不打算使用集合或导入的库:)
并非所有带有列表的东西都是理解的自然候选者。另外，为什么要使用二次算法？
这个问题听起来是人为的。有许多（并且更有效）的方法可以实现您想要的。
使用set不导入任何库

标签： python list duplicates

【解决方案1】：

我认为可能对你有用。它在保持顺序的同时从列表中删除重复项。

newlist=[i for n,i in enumerate(L) if i not in L[:n]]

【讨论】：

非常好，使用enumerate作为生成器并检查目前看到的列表切片。

【解决方案2】：

对于Python3.6+，可以使用dict.fromkeys()：

>>> a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]
>>> list(dict.fromkeys(a))
[1, 2, 3, 4, 6, 5]

来自docs：

创建一个新字典，其中键来自可迭代对象，值设置为值。

如果您使用的是较低版本的 Python，则需要使用collections.OrderedDict 来维护顺序：

>>> from collections import OrderedDict
>>> a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]
>>> list(OrderedDict.fromkeys(a))
[1, 2, 3, 4, 6, 5]

【讨论】：

【解决方案3】：

这里是建议解决方案的简单基准，

它表明dict.fromkeys 将表现最好

from simple_benchmark import BenchmarkBuilder
import random


b = BenchmarkBuilder()

@b.add_function()
def AmitDavidson(a):
    return [i for n,i in enumerate(a) if i not in a[:n]]

@b.add_function()
def RoadRunner(a):
    return list(dict.fromkeys(a))

@b.add_function()
def DaniMesejo(a):
    return  list({k: '' for k in a})


@b.add_function()
def rdas(a):
    return  sorted(list(set(a)), key=lambda x: a.index(x))


@b.add_function()
def unwanted_set(a):
    return  list(set(a))


@b.add_arguments('List lenght')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp
        yield size, [random.randint(0, 10) for _ in range(size)]

r = b.run()
r.plot()

【讨论】：

不错。我打算发布类似的东西，但这更好。 +1
这太棒了:) +1
stackoverflow 应该自动创建这些图表

【解决方案4】：

这是一个使用set 的解决方案，它确实保留了顺序：

a = [1,2,3,4,2,6,1,1,5,2]
a_uniq = sorted(list(set(a)), key=lambda x: a.index(x))
print(a_uniq)

【讨论】：

使用set（超出其简洁性）的主要动机是在次二次时间中进行去除，但index 的使用将其推回到二次时间。
OPs 理解也是如此

【解决方案5】：

单行，理解，O(n)，在 Python 3.6+ 中保持顺序：

a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]

res = list({k: '' for k in a})
print(res)

【讨论】：