如何对相同值的范围进行二进制搜索？答案

【问题标题】：How to do a binary search for a range of the same value?如何对相同值的范围进行二进制搜索？
【发布时间】：2015-08-27 23:02:06
【问题描述】：

我有一个排序的数字列表，我需要让它返回数字出现的索引范围。我的清单是：

daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]

如果我搜索到 0，我需要返回 (0, 3)。现在我只能找到一个数字的位置！我知道如何进行二进制搜索，但我被困在如何让它从该位置上下移动以找到其他相同的值！

low = 0
high = len(daysSick) - 1
while low <= high :
    mid = (low + high) // 2
    if value < daysSick[mid]:
        high = mid - 1
    elif value > list[mid]:
        low = mid + 1
    else:
        return mid

【问题讨论】：

你有什么理由不使用任何python函数吗？
@TerranceSeo, hg.python.org/cpython/file/2.7/Lib/bisect.py, bisect 模块源码包含简单的python二分函数方法

标签： python binary-search

【解决方案1】：

你为什么不用python's bisection routines:

>>> daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
>>> from bisect import bisect_left, bisect_right
>>> bisect_left(daysSick, 3)
6
>>> bisect_right(daysSick, 3)
9
>>> daysSick[6:9]
[3, 3, 3]

【讨论】：

@TerranceSeo，见 cmets 中的链接，bisect 方法没有什么特别之处，它们是用纯 python 实现的。
@wim OP 想要实际的索引，即6, 8 - 您需要从bisect_right() 中减去1
@PadraicCunningham 不正确，它们是用 C 实现的。有一个用于后备的参考 python 实现，但它没有正常使用。
c 实现（如果可用）。该操作正在寻找我发布的链接中的算法。

【解决方案2】：

我提出了一个比bisect 库中的raw functions taken 更快的解决方案

解决方案

使用优化二分搜索

def search(a, x):
    right = 0
    h = len(a)
    while right < h:
        m = (right+h)//2
        if x < a[m]: h = m
        else: 
            right = m+1
    # start binary search for left element only 
    # including elements from 0 to right-1 - much faster!
    left = 0
    h = right - 1
    while left < h:
        m = (left+h)//2
        if x > a[m]: left = m+1
        else: 
            h = m
    return left, right-1

search(daysSick, 5)
(10, 12)

search(daysSick, 2)
(5, 5)

与`Bisect`的比较

使用自定义的二分搜索...

%timeit search(daysSick, 3)
1000000 loops, best of 3: 1.23 µs per loop

将源代码从bisect复制到python...

%timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1)
1000000 loops, best of 3: 1.77 µs per loop

使用默认导入是迄今为止最快的，因为我认为它可能会在幕后进行优化...

from bisect import bisect_left, bisect_right
%timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1)
1000000 loops, best of 3: 504 ns per loop

额外

没有分机。库，但不是二进制搜索

daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]

# using a function
idxL = lambda val, lst:  [i for i,d in enumerate(lst) if d==val]

allVals = idxL(0,daysSick)
(0, 3)

【讨论】：

好吧，那么OP的二分查找就不需要了，因为你在这里扫描了整个列表。
@Alik 我在他说答案必须在其中包含二进制搜索之前发布。我的解决方案是在不导入库的情况下回答问题
@Terrance Seo - 我进行了编辑以包含针对您的问题的优化二分搜索
看起来 bisect 模块的速度是原来的两倍多。我是不是看错了你的结果？！
@wim 这是因为内置模块是原始 C 代码。这就是为什么我复制了 bisect python 源代码以进行公平比较。如果您采用我优化的代码并将其翻译成 C / Fortran，它将击败内置模块，如 python 与 python 比较所示。

【解决方案3】：

好的，这是另一种方法，它先尝试缩小范围，然后再对已缩小范围的一半进行bisect_left 和bisect_right。我编写这段代码是因为我认为它略比仅仅调用bisect_left 和bisect_right 更有效，即使它具有相同的时间复杂度。

def binary_range_search(s, x):
    # First we will reduce the low..high range if possible
    # by using symmetric binary search to find an index pointing to x
    low, high = 0, len(s)
    while True:
        if low >= high:
            return None
        mid = (low + high) // 2
        mid_element = s[mid]
        if x == mid_element:
            break
        elif x < mid_element:
            high = mid
        else:
            low = mid + 1
    xindex = mid

    # Now we have found an index pointing to x called xindex
    # and potentially reduced the low..high range
    # now we can run bisect_left on low..xindex + 1

    lo, hi = low, xindex + 1
    while lo < hi:
        mid = (lo+hi)//2
        if x > s[mid]: lo = mid+1
        else: hi = mid
    first = lo

    # and also bisect_right on xindex..high

    lo, hi = xindex, high
    while lo < hi:
        mid = (lo+hi)//2
        if x < s[mid]: hi = mid
        else: lo = mid+1
    last = lo - 1

    return first, last

我认为时间复杂度是 O(log n) 就像简单的解决方案一样，但我相信这无论如何都会更有效率。我认为值得注意的是，您在第二部分执行bisect_left 和bisect_right 可以并行处理大型数据集，因为它们是不交互的独立操作。

【讨论】：

解决方案

与Bisect的比较

额外

与`Bisect`的比较