我通过两个列表实现二进制搜索的速度非常慢答案

【问题标题】：My implementation of binary search through two lists is exhorbitantly slow我通过两个列表实现二进制搜索的速度非常慢
【发布时间】：2021-10-30 04:23:27
【问题描述】：

对于我参加的算法课程，我们正在实施一些算法并测试它们的速度。我选择 Python 作为我的语言来做到这一点。我们有 2 个未排序的列表和一个数字 x，我们想找出 S1 中是否有任何元素 a 和 S2 中的 b 满足 a + b = x。我的做法是这样的：

def find_in(s, s2):
    start, end = 0, len(s2)-1
    while end >= start:
        mid = start + (end - start) // 2
        if s2[mid] == s:
            return True
        if s2[mid] > s:
            start = mid +1
        if s2[mid] < s:
            end = mid - 1
    return False

@timing
def binary_search(x, s1 : list, s2 : list) -> bool:
    return any( find_in(x - s, sorted(s2)) for s in s1 )

所以该函数循环遍历一个未排序的列表，然后使用二进制搜索在排序列表中查找元素x - s。无论出于何种原因，对于使用 Python 随机模块生成的 10000 列表长度，平均需要 10 秒，这比我尝试的蛮力方法要长。在我写的东西中是否有一些我遗漏的微妙之处？我觉得这应该是 O(n log n)，比 O(n²)

快

【问题讨论】：

我远非 Python 专家，但我假设 sorted(s2) 位会为理解中的每个 s 重新评估，这意味着 Python 必须重新排序列出 n 次。如果你把它拉到上一行，这个函数应该会快得多。
是的，这里的问题是您在每次迭代时都对列表进行排序。 Python 的排序算法针对常见情况进行了优化，例如对已经排序的列表进行排序，但是在已经排序的列表上调用 sorted 仍然需要 O(n) 时间，因为你加起来是 O(n^2) '正在排序 O(n) 次。
@kaya3: sorted 返回一个新列表而不是改变输入，因此sorted 调用每次都从头开始执行完整排序。
@user2357112supportsMonica 哦，对了，我脑残！所以它加起来就是 O(n^2 log n)。
我明白了，这是有道理的。当我不每次都对列表进行排序时要快得多。

标签： python algorithm binary-search

【解决方案1】：

逻辑：

给定 a+b = x
这意味着 b = x-a（我们知道 x，我们知道 a，我们需要找出 b 是否存在）。
这意味着a = x-b（我们知道x，我们知道b，我们需要找出a 是否存在）。 {我猜这部分不是必需的}

代码：

 def AplusB(arr1 , arr2, x):
        #stores the frequency of arr1 (not required i guess)
        d1 = {}
        for i in arr1:
            if(i in d1):
                d1[i] += 1
            else:
                d1[i] = 1
        #stores the frequency of arr2
        d2 = {}
        for i in arr2:
            if(i in d2):
                d2[i] += 1
            else:
                d2[i] = 1
        isAnsExists = False
        for i in arr1:
            val = x - i
            if(val in d2):
                isAnsExists = True
        for i in arr2:
            val = x - i
            if(val in d1):
                isAnsExists = True
        return isAnsExists

随机测试：

x = random.randint(0, 10000)
arr1 = [random.randint(0, 100000) for i in range(100000)]
arr2 = [random.randint(0, 100000) for i in range(100000)]
print( AplusB( arr1,arr2 , x))

【讨论】：

是的，这个解决方案也有效，它实际上是在作业的不同部分。对于这种特定情况，我需要在两个未排序的列表上使用二进制搜索，以展示算法运行时间之间的差异。不幸的是，这部分作业禁止使用哈希表或集合或其他 O(1) 数据结构。
ohkkk 酷 ....