【问题标题】:Find all anagrams in a string O(n) solution查找字符串 O(n) 解决方案中的所有字谜
【发布时间】:2017-01-20 10:44:10
【问题描述】:

问题来了:

给定一个字符串s和一个非空字符串p,在s中找到p的变位词的所有起始索引。

Input: s: "cbaebabacd" p: "abc"
Output: [0, 6]
Input: s: "abab" p: "ab"
Output: [0, 1, 2]

这是我的解决方案

vector<int> findAnagrams(string s, string p) {
    vector<int> res, s_map(26,0), p_map(26,0);
    int s_len = s.size();
    int p_len = p.size();
    if (s_len < p_len) return res;
    for (int i = 0; i < p_len; i++) {
        ++s_map[s[i] - 'a'];
        ++p_map[p[i] - 'a'];
    }
    if (s_map == p_map)
        res.push_back(0);
    for (int i = p_len; i < s_len; i++) {
        ++s_map[s[i] - 'a'];
        --s_map[s[i - p_len] - 'a'];
        if (s_map == p_map)
            res.push_back(i - p_len + 1);
    }
    return res;
}

但是,我认为这是 O(n^2) 解决方案,因为我必须比较向量 s_mapp_map。 这个问题是否存在 O(n) 解决方案?

【问题讨论】:

  • 这不是一个非常适合 Stack Overflow 的问题。你知道 O(n) 算法吗?你找过一个吗?如果您正在寻找一般建议,也许Quora 是一个更好的地方。请注意,在谈论排列时,您不太可能找到 O(n) 解决方案。
  • 不确定这是否更好,但您可以首先生成 p 的所有排列,然后使用类似 aho-corasick 字符串匹配的东西。当你说 O(n) 时,n 指的是什么(因为有两个参数:s 字符串长度和 p 字符串长度)。
  • 可能也对此库感兴趣:combinatorics.codeplex.com

标签: algorithm data-structures


【解决方案1】:

假设p 的大小为n

假设您有一个大小为 26 的数组 A,其中填充了 p 包含的 a、b、c、... 的数量。

然后创建一个大小为 26 的新数组 B,用 0 填充。

让我们调用给定的(大)字符串s

首先,您在s 的第一个n 字符中使用a、b、c、...的编号初始化B

然后您在s 中遍历每个大小为n 的单词,始终更新B 以适应这个n 大小的单词。

总是B 匹配A 你会有一个索引,我们有一个字谜。

要将B 从一个n 大小的单词更改为另一个,请注意您只需在B 中删除前一个单词的第一个字符并添加下一个单词的新字符。

看例子:

Input
s: "cbaebabacd" 
p: "abc"          n = 3 (size of p)

A = {1, 1, 1, 0, 0, 0, ... }  // p contains just 1a, 1b and 1c.

B = {1, 1, 1, 0, 0, 0, ... }  // initially, the first n-sized word contains this.

compare(A,B)

for i = n; i < size of s; i++ {
    B[ s[i-n] ]--;
    B[ s[ i ] ]++;
    compare(A,B)
}

并假设compare(A,B) 打印的索引总是 A 匹配 B。

总复杂度为:

first fill of A  = O(size of p)
first fill of B  = O(size of s)
first comparison = O(26)
for-loop = |s| * (2 + O(26)) = |s| * O(28) = O(28|s|) = O(size of s)
____________________________________________________________________
2 * O(size of s) + O(size of p) + O(26)

与 s 大小成线性关系。

【讨论】:

    【解决方案2】:

    您的解决方案 O(n) 解决方案。 s_mapp_map 向量的大小是一个常数 (26),它不依赖于 n。所以s_mapp_map 之间的比较需要固定的时间,无论n 有多大。

    您的解决方案大约需要 26 * n 整数比较才能完成,即 O(n)

    【讨论】:

    • 对于大小为m 的通用字母表,OP 的算法是O(n + m),其中n 是输入大小的合理度量。对于足够大的输入字符串,m 几乎可以忽略不计。
    • @Code-Apprentice 我同意。它还使用 O(p + m) 空间,其中 p 是要搜索的模式的大小。
    • @Code-Apprentice 实际上,我说错了。如果模式大小为 p,要搜索的字符串大小为 n,字母大小为 m,我认为算法需要 O(p + (n-p)*m)。这可能比 O(n+m) 多很多
    【解决方案3】:
    // In papers on string searching algorithms, the alphabet is often
    // called Sigma, and it is often not considered a constant. Your
    // algorthm works in (Sigma * n) time, where n is the length of the
    // longer string. Below is an algorithm that works in O(n) time even
    // when Sigma is too large to make an array of size Sigma, as long as
    // values from Sigma are a constant number of "machine words".
    
    // This solution works in O(n) time "with high probability", meaning
    // that for all c > 2 the probability that the algorithm takes more
    // than c*n time is 1-o(n^-c). This is a looser bound than O(n)
    // worst-cast because it uses hash tables, which depend on randomness.
    
    #include <functional>
    #include <iostream>
    #include <type_traits>
    #include <vector>
    #include <unordered_map>
    #include <vector>
    
    using namespace std;
    
    // Finding a needle in a haystack. This works for any iterable type
    // whose members can be stored as keys of an unordered_map.
    template <typename T>
    vector<size_t> AnagramLocations(const T& needle, const T& haystack) {
      // Think of a contiguous region of an ordered container as
      // representing a function f with the domain being the type of item
      // stored in the container and the codomain being the natural
      // numbers. We say that f(x) = n when there are n x's in the
      // contiguous region.
      //
      // Then two contiguous regions are anagrams when they have the same
      // function. We can track how close they are to being anagrams by
      // subtracting one function from the other, pointwise. When that
      // difference is uniformly 0, then the regions are anagrams.
      unordered_map<remove_const_t<remove_reference_t<decltype(*needle.begin())>>,
                    intmax_t> difference;
      // As we iterate through the haystack, we track the lead (part
      // closest to the end) and lag (part closest to the beginning) of a
      // contiguous region in the haystack. When we move the region
      // forward by one, one part of the function f is increased by +1 and
      // one part is decreased by -1, so the same is true of difference.
      auto lag = haystack.begin(), lead = haystack.begin();
    
      // To compare difference to the uniformly-zero function in O(1)
      // time, we make sure it does not contain any points that map to
      // 0. The the property of being uniformly zero is the same as the
      // property of having an empty difference.
      const auto find = [&](const auto& x) {
        difference[x]++;
        if (0 == difference[x]) difference.erase(x);
      };
      const auto lose = [&](const auto& x) {
        difference[x]--;
        if (0 == difference[x]) difference.erase(x);
      };
      vector<size_t> result;
      // First we initialize the difference with the first needle.size()
      // items from both needle and haystack.
      for (const auto& x : needle) {
        lose(x);
        find(*lead);
        ++lead;
        if (lead == haystack.end()) return result;
      }
      size_t i = 0;
      if (difference.empty()) result.push_back(i++);
      // Now we iterate through the haystack with lead, lag, and i (the
      // position of lag) updating difference in O(1) time at each spot.
      for (; lead != haystack.end(); ++lead, ++lag, ++i) {
        find(*lead);
        lose(*lag);
        if (difference.empty()) result.push_back(i);
      }
      return result;
    }
    
    int main() {
      string needle, haystack;
      cin >> needle >> haystack;
      const auto result = AnagramLocations(needle, haystack);
      for (auto x : result) cout << x << ' ';
    }
    

    【讨论】:

      猜你喜欢
      • 2019-09-13
      • 2021-05-12
      • 2016-11-08
      • 1970-01-01
      • 2011-10-02
      • 2020-05-14
      • 2016-03-14
      • 1970-01-01
      • 2016-02-16
      相关资源
      最近更新 更多