在字符串中查找前缀答案

【问题标题】：Find prefix within a string在字符串中查找前缀
【发布时间】：2020-10-17 01:02:00
【问题描述】：

我目前正在做一个 leetcode 问题，我必须在句子中找到前缀并返回句子中的单词编号，否则返回 -1。我想出了一个解决方案，但它会因一些字符串而崩溃，我不知道为什么。一个例子如下：

输入： sentence = "我喜欢吃汉堡"，searchWord = "burg"
输出： 4（我也得到 4 的输出）
解释：“burg”是“burger”的前缀，是句子中的第4个单词。

但是这个例子失败了：

输入： sentence = "this question is an easy question", searchWord = "pro"
输出： 2（我得到的输出为 6）解释：“pro”是“problem”的前缀，是句子中的第2个和第6个词，但我们返回2，因为它是最小索引。

我的cout为此产生了一个非常奇怪的sn-p：

 problem is an easy problem
 problem is an easy problem
 problem is an easy problem
 problem is an easy problem
probl
proble
problem
problem
problem i
problem is

当 i 递增时，它完全忽略了前几个子字符串，这是唯一一次发生。

int isPrefixOfWord(string sentence, string searchWord)
{
    string sub;
    int count = 1;
    for (int i = 0; i < sentence.length(); i++)
    {
        if (sentence[i] == ' ')
            count++;
        for (int j = i; j < sentence.length(); j++)
        {
            sub = sentence.substr(i, j);
            cout<<sub<<endl;
            if (sub == searchWord)
            {
                return count;
            }
        }
    }
    return -1;
}

有什么想法吗？

int isPrefixOfWord(string sentence, string searchWord)
{
    string sub;
    int count = 1;
    for (int i = 0; i < sentence.length() - searchWord.length() - 1; i++)
    {
        if (sentence[i] == ' ')
            count++;
        
        sub = sentence.substr(i,searchWord.length());
        if ( sub == searchWord && (sentence[i-1] == ' ' || i == 0))
        {
            return count;
        }
    
    }
    return -1;
}

【问题讨论】：

您错误地使用了substr。您也应该将程序的输出添加到您的问题中。
这是来自i 的长度。你没有检查那个。
您的程序中有cout<<sub<<endl;。为什么不显示？
嗯，骗子错了。您实际上应该更改循环。
您正在使用sub 来查看它是否是您要搜索的单词。您的输出显示了许多您不应该考虑的字符串。

标签： c++ algorithm

【解决方案1】：

使用starts_with 的一个非常简单的C++20 解决方案：

#include <string>
#include <sstream>
#include <iostream>

int isPrefixOfWord(std::string sentence, std::string searchWord)
{
    int count = 1;
    std::istringstream strm(sentence);
    std::string word;
    while (strm >> word)
    {
        if ( word.starts_with(searchWord) )
           return count;
        ++count;
    }
    return -1;        
}

int main()
{
    std::cout << isPrefixOfWord("i love eating burger",  "burg") << "\n";
    std::cout << isPrefixOfWord("this problem is an easy problem", "pro") << "\n";
    std::cout << isPrefixOfWord("this problem is an easy problem", "lo");
}

输出：

4
2
-1

目前，LeetCode 等很多在线编码网站不支持 C++20，因此该代码在这些在线平台上无法编译成功。

因此，here is a live example using a C++20 compiler

【讨论】：

使用哪个流类并不重要，只要它有效。我倾向于明确并使用std::istringstream 输入和std::ostringstream 输出。

【解决方案2】：

我们可以使用std::basic_stringstream 来解决这个问题。这将通过：

// The following block might slightly improve the execution time;
// Can be removed;
static const auto __optimize__ = []() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);
    std::cout.tie(nullptr);
    return 0;
}();

// Most of headers are already included;
// Can be removed;
#include <cstdint>
#include <string>
#include <sstream>


static const struct Solution {
    static const int isPrefixOfWord(
        const std::string sentence,
        const std::string_view search_word
    ) {
        std::basic_stringstream stream_sentence(sentence);
        std::size_t index = 1;
        std::string word;

        while (stream_sentence >> word) {
            if (!word.find(search_word)) {
                return index;
            }

            ++index;
        }

        return -1;

    }
};

【讨论】：

【解决方案3】：

影响函数输出的错误是您没有在内部 for 循环中处理 i 的增量：

for (int i = 0; i < sentence.length(); i++)
{
    if (sentence[i] == ' ')
        count++;

    for (int j = i; j < sentence.length(); j++)
    {
        sub = sentence.substr(i, j);
        cout<<sub<<endl;
        if (sub == searchWord)
        {
            return count;
        }
    }
}

请注意，一旦您的内循环完成，i 总是会迭代 1。因此，您对单词的下一次搜索将错误地从其下一个字符开始，这会错误地搜索“子词”而不是仅搜索前缀，因此会产生误报（和不必要的工作）。

还要注意每次你这样做：

(sub == searchWord)

这会检查所有 j 字符，即使我们只对新的 jth 字符是否匹配感兴趣。

另一个影响您的性能和couts 的错误是您没有处理不匹配：

if (sub == searchWord)

...永远不会为假，因此退出内部循环的唯一方法是将增量 j 保持到数组的末尾，因此 sub 最终会很大。

修复第二个错误的一种方法是像这样替换你的内部循环：

    if (sentence.substr(i, i + searchWord.length()) == searchWord)
        return count;

最后，修复所有错误：

int isPrefixOfWord (const string & sentence, const string & searchWord)
{
    if (sentence.length() < searchWord.length())
        return -1;

    const size_t i_max = sentence.length() - searchWord.length();

    for (size_t i = 0, count = 1; ; ++count)
    {
        // flush spaces:
        while (sentence[i] == ' ')
        {
            if (i >= i_max)
                  return -1;

            ++i;
        }

        if (sentence.substr(i, searchWord.length()) == searchWord)
            return count;
      
        // flush word:
        while (sentence[i] != ' ')
        {
            if (i >= i_max)
                  return -1;

            ++i;
        }
    }
  
    return -1;
}

请注意，substr 提供了对象的副本（它不仅仅是围绕字符串的包装器），因此相对于searchWord.length()，这需要线性时间，这尤其糟糕，sentence 中的单词更小。

我们可以通过替换来提高速度

if (sentence.substr(i, searchWord.length()) == searchWord)
    return count;

...与

    for (size_t j = 0; sentence[i] == searchWord[j]; )
    {
        ++j;
   
        if (j == searchWord.size())
            return count;

        ++i;
    }

其他人展示了有助于解决问题的库的不错应用。

如果您无法访问这些库来完成您的作业，或者您只是想了解如何在不降低效率的情况下模块化这样的问题，那么这里有一种方法可以在 c++11 中完成，无需任何库（string除外）：

bool IsSpace (char c)
{
    return c == ' ';
}

bool NotSpace (char c)
{
    return c != ' ';
}

class PrefixFind
{
    using CharChecker = bool (*)(char);

    template <CharChecker Condition>
    void FlushWhile ()
    {
        while ((m_index < sentence.size()) 
            && Condition(sentence[m_index]))
            ++m_index;
    }
    
    void FlushWhiteSpaces ()
    {
        FlushWhile<IsSpace>();
    }
        
    void FlushToNextWord ()
    {
        FlushWhile<NotSpace>();
        FlushWhile<IsSpace>();
    }
    
    bool PrefixMatch ()
    {
        // SearchOngoing() must equal `true`
    
        size_t j = 0;
    
        while (sentence[m_index] == search_prefix[j])
        {
            ++j;
            
            if (j == search_prefix.size())
                return true;
            
            ++m_index;
        }
        
        return false;
    }
    
    bool SearchOngoing () const
    {
        return m_index + search_prefix.size() <= sentence.size();
    }
    
    const std::string & sentence;
    const std::string & search_prefix;
    size_t m_index;

public:

    PrefixFind (const std::string & s, const std::string & sw)
        : sentence(s),
        search_prefix(sw)
    {}

    int FirstMatchingWord ()
    {
        const int NO_MATCHES = -1;
    
        if (!search_prefix.length())
            return NO_MATCHES;
    
        m_index = 0;
        FlushWhiteSpaces();

        for (int n = 1; SearchOngoing(); ++n)
        {
            if (PrefixMatch())
                return n;

            FlushToNextWord();
        }

        return NO_MATCHES;
    }
};

在速度方面：如果我们认为sentence 的长度为m，searchWord 的长度为n，那么原始（错误）代码的时间复杂度为O(n*m^2)。但是通过这种改进，我们得到了O(m)。

【讨论】：

这是有道理的，但是由于某种原因它没有通过所有测试。在等待有人帮助的同时，我尝试尝试修复它，但是我当前的解决方案适用于 38/39 的测试，但最后一个会发生堆缓冲区溢出。我将在帖子底部发布我的代码。
@GeorgeKhanachat，现在怎么样了？ =)
您留下了一个错误：左大括号对齐似乎已关闭 :-)
是的，有点幽默。内联大括号 > 换行大括号！否则，您的答案是可靠的！
@MarkMoretto，哈哈。谢谢。是的 - 我知道你不是认真的，但不确定你的意思。是的，我是换行格式的忠实粉丝。 =P