查找给定排序数组中不存在的最小数字 >= x答案

【问题标题】：Finding smallest number >= x not present in the given sorted array查找给定排序数组中不存在的最小数字 >= x
【发布时间】：2021-06-25 03:39:29
【问题描述】：

我在编写修改后的二分搜索算法时遇到困难，该算法返回大于或等于 X 的最小数字，该数字不出现在已排序的数组中。

例如，如果数组是{1,2,3,5,6} 和x = 2，那么答案是 4。请指导我如何编写二进制搜索。对于每个 x，我必须在 O(log n) 时间内回答这个问题。由于我将此数组作为输入，最初会花费线性时间，因此您可以根据需要对数组进行某种预处理。

x 也被视为输入，可能存在也可能不存在于数组中。

输入数组可能有重复元素。

我的输入数字可以在 [0,10^9] 范围内，因此由于空间限制，首先将所有缺失值放入数组中是不可行的。

此外，您可以进行需要 O(n) 时间的预处理，因为您将数组作为线性时间的输入。之后，我们会说 10^6 次 X 查询，您必须在 O(log n) 时间内回答这些查询

【问题讨论】：

通过判断元素个数是否为(the rightmost element) - (the leftmost element) + 1，判断排序后的数组元素是否连续（如{1, 2, 3, 4, 5}）。
您的数组是否已排序？
已排序，但可能有重复元素
将输入转换为“段”：{1,2,3,5,6} -> {{1,3} ,{5,6}}，然后lower_bound 可能会选择正确的段。
@Jarod42 我喜欢这个主意，但你能告诉我搜索 lower_bound 的值吗？像 {X, +INF} 这样的东西可以吗？

标签： c++ sorting binary-search

【解决方案1】：

如果我理解正确，您可以进行任何类型的预处理，并且只找到不同x 的结果必须是O(log n)。如果是这样的话，在预处理后找到结果并不是什么大不了的事。 O(log n) 搜索算法确实存在。好的候选人是std::binary_search 或std::lower_bound。

一个非常幼稚的方法是准备一个包含所有缺失元素的向量，然后在上面std::lower_bound：

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> input{1,2,3,5,6,10,12};
    std::vector<int> missing_elements{4,7,8,9,11};
    int x = 2;
    auto it = std::lower_bound(missing_elements.begin(),missing_elements.end(),x);
    std::cout << *it << "\n";
}

填充missing_elements 可以在O(1) 中完成。但是，10^9 大小的missing_elements 当然是不可行的。此外，这种方法对于像[1,100000000] 这样的输入非常浪费（不是时间复杂度，而是运行时和内存使用）。

Jarod42 在评论中提出的一个想法是准备一个段向量，然后在上面std::lower_bound。首先假设预处理已经完成：

#include <iostream>
#include <vector>
#include <algorithm>

int find_first_missing(const std::vector<std::pair<int,int>>& segments,int x){
    std::pair<int,int> p{x,x};
    auto it = std::lower_bound(segments.begin(),segments.end(),p,[](auto a,auto b){
        return a.second < b.second;
    });
    if (it == segments.end()) return x;
    if (it->first > x) return x;    
    return it->second+1;
}

int main() {
    std::vector<int> input{1,2,3,5,6,10,12};
    std::vector<std::pair<int,int>> segments{{1,3},{5,6},{10,10},{12,12}};
    for (int x=0; x<13;++x) std::cout << x << " -> " << find_first_missing(segments,x) << "\n";
}

Output:

0 -> 0
1 -> 4
2 -> 4
3 -> 4
4 -> 4
5 -> 7
6 -> 7
7 -> 7
8 -> 8
9 -> 9
10 -> 11
11 -> 11
12 -> 13

因为input 已排序，segments 已排序，所以我们可以使用自定义比较器，只比较段的末尾。段向量也相对于该比较器进行排序。对lower_bound 的调用将迭代器返回到x 在内部或x 低于该段的段，因此if (it->first > x) return x; 否则我们知道it->second+1 是下一个缺失的数字。

现在只剩下创建线段向量了：

#include <iostream>
#include <vector>
#include <algorithm>
#include <cassert>

std::vector<std::pair<int,int>> segment(const std::vector<int>& input){
    std::vector<std::pair<int,int>> result;
    if (input.size() == 0) return result;
    
    int current_start = input[0];
    for (int i=1;i<input.size();++i){
        if (input[i-1] == input[i] || input[i-1]+1 == input[i]) continue;
        result.push_back({current_start,input[i-1]});
        current_start = input[i];        
    }
    result.push_back({current_start,input.back()});
    return result;
}

int main() {
    std::vector<int> input{1,2,3,5,6,10,12};
    std::vector<std::pair<int,int>> expected{{1,3},{5,6},{10,10},{12,12}};
    auto result = segment(input);
    for (const auto& e : result){
        std::cout << e.first << " " << e.second << "\n";
    }
    assert(expected == result);
}

【讨论】：

"可以在线性时间内完成" 我的输入是 {1, 1000000000000}，继续。
@n.1.8e9-where's-my-sharem。我猜你只是在取笑我。可以添加O(1) 对正常输入的检查，对于像你这样的病理输入，发现结果也只有O(1)。我觉得专注于复杂性是愚蠢的。除非必须设计一个通用算法库，否则复杂性在很大程度上是无关紧要的。我想这只是学术练习的效果。如果我必须为这个问题编写生产代码，我想知道时间限制和有关输入的更多信息，那么我使用O(n^2) 还是O(log n) 是次要的。
"可以添加 A O(1) 检查输入是否正常" 然后尝试添加它，我会破坏它。 “我觉得专注于复杂性是愚蠢的。”然后也许在答案中这样说，因为问题显然侧重于复杂性并期望答案也这样做。无论如何，在 cmets 中有一个有效的预处理建议：预处理到间隔。您也可以对缺失的间隔进行预处理。
@n.1.8e9-where's-my-sharem。但答案确实集中在复杂性上。有可能在线性时间内得到{1, 1000000000000} 的缺失数字，并且“如果我理解正确，您可以进行任何类型的预处理，并且只找到不同 x 的结果必须是 O(log n)”
@n.1.8e9-where's-my-sharem。不要误会我的意思，我喜欢接受挑战，我感谢任何批评者，但恐怕我不明白你的意思。在线性时间内获得missing_elements 只是一个旁注，因为据我了解，答案O(log n) 仅在预处理后需要任何东西

【解决方案2】：

如果数组中不存在x，则返回x。

如果存在x，则说它在位置l。另外，让我们将missing(i) 表示为i 左侧缺失的元素数。在索引为 1 的数组中，这等于 A[i]-i。然后继续从l 向右移动直到missing(i) - missing(l) = 0。您可以为此使用修改后的二进制搜索。假设p 是最后一个元素的位置，其中missing(p) - missing(l) = 0 然后A[p]+1 是第一个大于x 的缺失数字。

【讨论】：

【解决方案3】：

看看这个：

    #include <iostream>
    #include <vector>
    
    using namespace std;
    
    int greaterValue(const vector<int>& elements, int x){
        int low = 0, 
            high = elements.size() -1, 
            answer = x + 1;
        
        while (low <= high) {
            int mid = (low + high) / 2;
            if (elements[mid] <= answer) {
                if (elements[mid] == answer) {
                    answer++;
                    high = elements.size() - 1;
                }
                low = mid + 1;
            }
            else {
                high = mid - 1;
            }
        }
        return answer;
    }
    
    int main() {
        vector<int> elements = { 1, 2, 3, 5, 6 };
        int x = 2;
        int result = greaterValue(elements, x);
        cout << "The element is: " << result;
        return 0;
    }

测试：

    { 1, 2, 3, 5, 6 }

结果：

    The element is: 4

时间复杂度：

    O(log(n))

【讨论】：

这个程序doesn't work.
@n.1.8e9-where's-my-sharem。您的输入 ({ 1,1,1,1,5,5,5,5,9 }) 是：3 正确。
输出是“元素存在于索引 3”，这是什么意思？也许这只是错误的输出，它应该是“缺席的元素是 3”？然后看看this modified version。我只添加了计步。
不确定是O(log(n))，因为您正在寻找每个元素。（对我来说就像O(n log(n))）。
算法正确。基本上当answer 不在elements 中时，它会返回answer，但如果它存在，二分搜索首先找到它，将low 设置为下一个位置，将high 设置为最后一个位置并再次开始二分搜索。考虑A = [1, 2, 3, ..., n]，对于1，它将被称为log(n) 次，2 将被称为log(n-1) 次等等。这以O(nlog(n)) 为界。