C++中前缀的二分查找答案

【问题标题】：Binary Search for Prefix in C++C++中前缀的二分查找
【发布时间】：2021-10-06 16:43:11
【问题描述】：

我在 C++ 中有一个已排序的字符串向量。

我要做的是检查候选字符串是否是向量中字符串之一的前缀。由于尺寸限制，无法进行线性扫描。

如何实现自定义比较器来做到这一点？

据我了解，当前字符串比较器如下：

class search_comparator {
public:
    bool operator()(const string &value, const string &element) const
    {
          return value < element;
    }
};

现在我从C++ Reference知道：

对于所有元素，如果 element

但是如何将检查子字符串的条件添加到此比较器中？

另一种方法是使用 lower_bound 函数并检查结果是否包含子字符串，但我想知道是否可以直接使用 binary_search。

我自己使用 lower_bound 的解决方案如下（至少在我的机器上比线性扫描更快）：

for (int i=0; i<n; i++) {
        getline(cin, needle);
        auto found = lower_bound(haystack.begin(), haystack.end(), needle);
        if (found != haystack.end() && (*found).rfind(needle, 0) != string::npos) {
            count++;
        }
    }

【问题讨论】：

被排序的字符串向量如何帮助您找到包含特定子字符串的字符串？考虑一个数字的排序向量。如果您正在寻找数字 10，而向量中间的数字是 5，那么您知道该值（如果存在）必须在列表的后半部分。但是如果你正在寻找一个带有子字符串“pqr”的字符串，并且向量的中间有“jkl”，你怎么知道要排除哪一侧呢？毕竟，列表的开头可能有“apqr”或末尾有“zpqr”。
我不确定是否有可能获得比线性扫描更好的性能。毕竟，如果我的字符串向量类似于{"a", "b", "ba"} 并且我正在搜索"a"。 "b" 不包含 "a" 作为子字符串。但是 before "b" 和 "b" 之后的东西都可以！
在这种情况下，您正在寻找“前缀”，而不是“子字符串”。很高兴听到您找到了解决方案；考虑将其发布为答案！（您可能想编辑问题以说出“前缀”，以便遇到相同问题的人更有可能找到它。
谢谢 - 我不知道这些术语。
没有包含前缀限制的总排序。 lower_bound 后跟检查是最好的方法。

标签： c++ stl binary-search prefix

【解决方案1】：

不久前我asked 是这样的。我们可以在那里重新利用好答案：

#include <algorithm>
#include <cstdio>
#include <iterator>
#include <ranges>
#include <string>
#include <string_view>

template <std::ranges::range Rng>
[[nodiscard]] constexpr bool pref_exists(Rng const& rng,
                                         std::string_view const pref) noexcept {
  auto const iter = std::ranges::lower_bound(
      rng, pref,
      [n = pref.size()](std::string_view const a,
                        std::string_view const b) noexcept {
        return a.substr(0, n) < b.substr(0, n);
      });

  return iter != std::end(rng) && std::string_view{*iter}.starts_with(pref);
}

int main() {
  std::string words[] = {
      "hello",       "world",   "testing",   "theatergoer", "theatricals",
      "theirselves", "someone", "somewhere", "something",
  };

  std::ranges::sort(words);

  for (auto& pref : {"some", "the", "hal", "aab", "wo"}) {
    std::printf("prefix \"%s\" : does%s exist.\n", pref,
                pref_exists(words, pref) ? "" : "n't");
  }
}

假设前缀和字符串都很小，它的复杂度为 O(log n)，其中 n 是数组中的字符串数。

【讨论】：

【解决方案2】：

解决此问题的一种方法是使用lower_bound，据我了解，它在排序向量上使用 binary_search 算法。

for (int i=0; i<n; i++) {
        getline(cin, needle);
        auto found = lower_bound(haystack.begin(), haystack.end(), needle);
        if (found != haystack.end() && (*found).rfind(needle, 0) != string::npos) {
            count++;
        }
    }

如果有人有更优雅或更快的解决方案，请随时编辑和改进。

我根据@Sneftel 的评论使用rfind(needle, 0) 来查找真正的前缀。

【讨论】：