【问题标题】:binarySearch vs in, Unexpected Results (Python)binarySearch 与 in,意外结果 (Python)
【发布时间】:2015-11-03 09:02:39
【问题描述】:

我正在尝试比较 python2 中 in 和 binarySearch 的复杂性。期望 in 的 O(1) 和 binarySearch 的 O(logn)。然而,结果出乎意料。节目时间是否错误或有其他错误?

代码如下:

import time

x = [x for x in range(1000000)]
def Time_in(alist,item):
    t1  = time.time()
    found = item in alist
    t2 = time.time()
    timer = t2 - t1  
    return found, timer

def Time_binarySearch(alist, item):
    first = 0
    last = len(alist)-1
    found = False 
    t1 = time.time()
    while first<=last and not found:
        midpoint = (first + last)//2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint-1
            else:
                first = midpoint+1
    t2 = time.time()
    timer = t2 - t1
    return found, timer

print "binarySearch: ", Time_binarySearch(x, 600000)
print "in: ", Time_in(x, 600000)

结果是:

【问题讨论】:

  • 您的屏幕截图显示binary_search 太快无法测量,in 相当快,但可测量,因此速度较慢。看起来正是您所期望的,不是吗?
  • 我认为 O(1) 应该比 binarySearch O(logn) 更快。
  • 哦,x in list(...) 是 O(N),x in as set(...) 是 O(1),这就是你误入歧途的地方......
  • in 是 O(n)。(字典/集合可以声称 O(1),但更现实的是 O (log n))
  • 在标准 Python 中,您应该使用 timeit 模块进行计时测试,而不是 time.time()

标签: python time-complexity binary-search


【解决方案1】:

二分查找的速度如此之快,以至于当您尝试打印所花费的时间时,它只会打印出0.0。而使用in 需要的时间足够长,以至于您可以看到它所用时间的一小部分。

in 之所以需要更长的时间是因为这是一个列表,而不是set 或类似的数据结构;而对于一个集合,成员资格测试介于 O(1) 和 O(logn) 之间,在一个列表中,必须按顺序检查每个元素,直到匹配或列表耗尽。

这是一些基准测试代码:

from __future__ import print_function

import bisect
import timeit


def binarysearch(alist, item):
    first = 0
    last = len(alist) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if alist[midpoint] == item:
            found = True
        else:
            if item < alist[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return found


def bisect_index(alist, item):
    idx = bisect.bisect_left(alist, item)
    if idx != len(alist) and alist[idx] == item:
        found = True
    else:
        found = False
    return found


time_tests = [
    ('    600 in list(range(1000))',
     '600 in alist',
     'alist = list(range(1000))'),
    ('    600 in list(range(10000000))',
     '600 in alist',
     'alist = list(range(10000000))'),

    ('    600 in set(range(1000))',
     '600 in aset',
     'aset = set(range(1000))'),
    ('6000000 in set(range(10000000))',
     '6000000 in aset',
     'aset = set(range(10000000))'),

    ('binarysearch(list(range(1000)), 600)',
     'binarysearch(alist, 600)',
     'from __main__ import binarysearch; alist = list(range(1000))'),
    ('binarysearch(list(range(10000000)), 6000000)',
     'binarysearch(alist, 6000000)',
     'from __main__ import binarysearch; alist = list(range(10000000))'),

    ('bisect_index(list(range(1000)), 600)',
     'bisect_index(alist, 600)',
     'from __main__ import bisect_index; alist = list(range(1000))'),
    ('bisect_index(list(range(10000000)), 6000000)',
     'bisect_index(alist, 6000000)',
     'from __main__ import bisect_index; alist = list(range(10000000))'),
    ]

for display, statement, setup in time_tests:
    result = timeit.timeit(statement, setup, number=1000000)
    print('{0:<45}{1}'.format(display, result))

结果:

# Python 2.7

    600 in list(range(1000))                 5.29039907455
    600 in list(range(10000000))             5.22499394417
    600 in set(range(1000))                  0.0402979850769
6000000 in set(range(10000000))              0.0390179157257
binarysearch(list(range(1000)), 600)         0.961972951889
binarysearch(list(range(10000000)), 6000000) 3.014950037
bisect_index(list(range(1000)), 600)         0.421462059021
bisect_index(list(range(10000000)), 6000000) 0.634694814682

# Python 3.4

    600 in list(range(1000))                 8.578510413994081
    600 in list(range(10000000))             8.578105041990057
    600 in set(range(1000))                  0.04088461003266275
6000000 in set(range(10000000))              0.043901249999180436
binarysearch(list(range(1000)), 600)         1.6799193460028619
binarysearch(list(range(10000000)), 6000000) 6.099467994994484
bisect_index(list(range(1000)), 600)         0.5168328559957445
bisect_index(list(range(10000000)), 6000000) 0.7694612839259207

# PyPy 2.6.0 (Python 2.7.9)

    600 in list(range(1000))                 0.122292041779
    600 in list(range(10000000))             0.00196599960327
    600 in set(range(1000))                  0.101480007172
6000000 in set(range(10000000))              0.00759720802307
binarysearch(list(range(1000)), 600)         0.242530822754
binarysearch(list(range(10000000)), 6000000) 0.189949035645
bisect_index(list(range(1000)), 600)         0.132127046585
bisect_index(list(range(10000000)), 6000000) 0.197204828262

【讨论】:

    【解决方案2】:

    为什么在测试一个元素是否包含在列表中时需要 O(1)? 如果您对列表一无所知(就像在您的示例中那样对其进行排序),那么您必须遍历每个元素并进行比较。

    所以你得到 O(N)。

    Python 列表不能假设您在其中存储的内容,因此它们必须对list.__contains__ 使用简单的实现。 如果您想要更快的测试,那么您可以尝试使用字典或集合。

    【讨论】:

      【解决方案3】:

      以下是 Python 中所有列表方法的时间复杂度:

      因此可以看出 x in s 是 O(n),这比 binarySearch O(logn) 慢得多。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2020-03-15
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-03-21
        • 2023-04-01
        • 1970-01-01
        相关资源
        最近更新 更多