为什么 set() 让这段代码运行得这么快？答案

【问题标题】：Why does set( ) make this code run so much faster?为什么 set() 让这段代码运行得这么快？
【发布时间】：2014-09-14 03:03:48
【问题描述】：

我为 Project Euler Problem 35写了一些代码：

#Project Euler: Problem 35

import time

start = time.time()

def sieve_erat(n):
    '''creates list of all primes < n'''
    x = range(2,n)
    b = 0
    while x[b] < int(n ** 0.5) + 1:
        x = filter(lambda y: y % x[b] != 0 or y == x[b], x)
        b += 1
    else:
        return x

def circularPrimes(n):
    '''returns # of circular primes below n'''
    count = 0
    primes = sieve_erat(n)
    b = set(primes)
    for prime in primes:
        inc = 0
        a = str(prime)
        while inc < len(a):
            if int(a) not in b:
                break
            a = a[-1] + a[0:len(a) - 1]
            inc += 1
        else:
            count += 1
    else:
        return count

print circularPrimes(1000000)
elapsed = (time.time() - start)
print "Found in %s seconds" % elapsed

我想知道为什么当我在circularPrimes 函数中设置b = set(primes) 时，这段代码（上面）运行得这么快。此代码的运行时间约为 8 秒。最初，我没有设置b = set(primes)，而我的circularPrimes 函数是这样的：

def circularPrimes(n):
    '''returns # of circular primes below n'''
    count = 0
    primes = sieve_erat(n)
    for prime in primes:
        inc = 0
        a = str(prime)
        while inc < len(a):
            if int(a) not in primes:
                break
            a = a[-1] + a[0:len(a) - 1]
            inc += 1
        else:
            count += 1
    else:
        return count

我的初始代码（没有b = set(primes)）运行了很长时间，以至于我没有等待它完成。我很好奇为什么在两段代码之间的运行时间方面存在如此大的差异，因为我不相信primes 会有任何重复项会使迭代花费更长的时间通过set(primes)。也许我对 set() 的想法是错误的。欢迎任何帮助。

【问题讨论】：

检查contains 是集合中的常数时间，而列表中最坏情况的线性时间，这是我假设您从函数sieve_erat 返回的时间。
见TimeComplexity python wiki page。
顺便说一句，您可以利用素数列表已排序（sieve_erat 生成它）这一事实来加快速度。
Why is converting a list to a set faster than using just list to compute a list difference? 的可能副本

标签： python performance set

【解决方案1】：

我相信这里的罪魁祸首是if int(a) not in b:。集合在内部作为哈希表实现，这意味着检查成员资格比使用列表便宜得多（因为您只需要检查冲突）。

你可以查看here的套装内幕。

【讨论】：

@hlove 是的，正如 jmduke 所说。使用列表，测试成员资格扫描（可能是整个）列表，使用集合，它是具有恒定执行时间的操作，因为集合具有哈希作为数据库索引的排序。