在 C++ 中创建自定义比较器答案

【问题标题】：Creating a custom comparator in C++在 C++ 中创建自定义比较器
【发布时间】：2020-03-08 21:07:06
【问题描述】：

背景：

我今天在一次在线练习面试中被问到这个问题，我很难找到一个自定义比较器来排序。问题来了

问题：

实现一个文档扫描函数wordCountEngine，它接收一个字符串文档并返回其中所有唯一单词的列表及其出现次数，按出现次数降序排列。如果两个或多个单词的计数相同，则应按照它们在原句中的顺序进行排序。假设所有字母都是英文字母。你的函数应该不区分大小写，例如，“Perfect”和“perfect”这两个词应该被认为是同一个词。

引擎应该去掉标点符号（即使在单词中间）并使用空格来分隔单词。

分析解决方案的时间和空间复杂性。尝试在保持多项式空间复杂度的同时优化时间。

例子：

input: document = "熟能生巧。你只会通过练习获得完美。练习吧！”

输出：[[“练习”，“3”]，[“完美”，“2”]， ["makes", "1"], ["youll", "1"], ["only", "1"], ["get", "1"], ["by", "1"], ["just", "1"] ]

我的想法：

我想做的第一个想法是首先将没有标点符号且全部小写的字符串放入字符串向量中。然后我使用unordered_map 容器来存储字符串和它的出现次数。我遇到困难的地方是创建一个自定义比较器，以确保如果我有一个具有相同计数的字符串，那么我会根据它在实际给定字符串中的优先级对其进行排序。

代码：

#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <sstream>
#include <iterator>
#include <numeric>
#include <algorithm>
using namespace std;


struct cmp
{
  bool operator()(std::string& word1, std::string& word2)
  {

  }
};

vector<vector<string>> wordCountEngine( const string& document ) 
{
  // your code goes here
  // Step 1
  auto doc = document;
  std::string str;
  remove_copy_if(doc.begin(), doc.end(), std::back_inserter(str), 
                     std::ptr_fun<int, int>(&std::ispunct));
  for(int i = 0; i < str.size(); ++i)
    str[i] = tolower(str[i]);
  std::stringstream ss(str);
  istream_iterator<std::string> begin(ss);
  istream_iterator<std::string> end;
  std::vector<std::string> vec(begin, end);

  // Step 2
  std::unordered_map<std::string, int> m;
  for(auto word : vec)
    m[word]++;

  // Step 3
  std::vector<std::vector<std::string>> result;
  for(auto it : m)
  {
    result.push_back({it.first, std::to_string(it.second)});
  }


  return result;

}

int main() {

  std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
  auto result = wordCountEngine(document);
  for(int i = 0; i < result.size(); ++i)
  {
    for(int j = 0; j < result[0].size(); ++j)
    {
      std::cout << result[i][j] << " ";
    }
    std::cout << "\n";
  }

  return 0;
}

如果有人可以帮助我学习如何为此代码构建自定义比较器，我将不胜感激。

【问题讨论】：

尝试使用std::vector<std::pair<std::string, int>> 作为result

标签： c++ algorithm sorting

【解决方案1】：

您可以使用std::vector<std::pair<std::string, int>>，每对代表一个单词以及该单词在序列中出现的次数。当两个或多个单词具有相同的计数时，使用向量将有助于保持原始序列的顺序。最后按出现次数排序。

#include <vector>
#include <algorithm>
#include <string>
#include <sstream>

std::vector<std::vector<std::string>> wordCountEngine(const std::string& document)
{
    std::vector<std::pair<std::string, int>> words;
    std::istringstream ss(document);
    std::string word;

    //Loop through words in sequence
    while (getline(ss, word, ' '))
    {
        //Convert to lowercase
        std::transform(word.begin(), word.end(), word.begin(), tolower);

        //Remove punctuation characters
        auto it = std::remove_if(word.begin(), word.end(), [](char c) { return !isalpha(c); });
        word.erase(it, word.end());

        //Find this word in the result vector
        auto pos = std::find_if(words.begin(), words.end(),
            [&word](const std::pair<std::string, int>& p) { return p.first == word; });
        if (pos == words.end()) {
            words.push_back({ word, 1 });  //Doesn't occur -> add it
        }
        else {
            pos->second++;                 //Increment count
        }
    }

    //Sort vector by word occurrences
    std::sort(words.begin(), words.end(),
        [](const std::pair<std::string, int>& p1, const std::pair<std::string, int>& p2) { return p1.second > p2.second; });

    //Convert to vector<vector<string>>
    std::vector<std::vector<std::string>> result;
    result.reserve(words.size());

    for (auto& p : words)
    {
        std::vector<std::string> v = { p.first, std::to_string(p.second) };
        result.push_back(v);
    }
    return result;
}

int main()
{
    std::string document = "Practice makes perfect. you'll only get Perfect by practice. just practice!";
    auto result = wordCountEngine(document);
    for (auto& word : result)
    {
        std::cout << word[0] << ", " << word[1] << std::endl;
    }
    return 0;
}

输出：
练习，3
完美，2
使, 1
你，1
只有，1
得到，1
由，1
只是，1

【讨论】：

@Snorrlaxxx 我添加了一些 cmets 来帮助解释它。
我认为对于这个测试用例，你的代码会失败：std::string document = "熟能生巧，你会通过实践变得完美。只是练习！只是刚刚！！";跨度>
@Snorrlaxxx 对于该测试用例，输出为：just, 4;练习3；完美，2；使, 1;你，1；得到，1； by, 1. 这是正确的。但是，如果删除一个“刚刚”，那么“练习”将首先出现。
你能帮我把 vector> 转换成 vector> 因为当我尝试它时，我提到的测试用例失败了。
@Snorrlaxxx 我已更新代码以返回 vector<vector<string>>。

【解决方案2】：

在第二步，试试这个：

std::vector<std::pair<std::pair<std::string, int>, int>> m;

这里，pair 存储字符串及其出现的索引，vector 存储pair 及其出现的计数。写一个逻辑，先按个数排序，如果个数相同，再按出现的位置排序。

bool sort_vector(const std::pair<const std::pair<std::string,int>,int> &a, const std::pair<const std::pair<std::string,int>,int> &b)
{
    if(a.second==b.second)
    {
        return a.first.second<b.first.second
        // This will make sure that if the no of occurances of each string is same, then it will be sorted according to the position of the string
    }
    return a.second>b.second
    //This will make sure that the strings are sorted in the order to return the string having higher no of occurances first.
}

你必须编写一个逻辑来计算字符串中每个单词的出现次数和出现索引。

【讨论】：

好建议！我会使用需要两对的自定义比较器来执行逻辑部分吗？
我想这行不通，因为我还需要知道计数的数量。我想一个函数应该可以做到这一点。
是的，如果你能提供一些有用的代码帮助，我还是有点卡住了
您不能直接对 unordered_map 进行排序，您必须将它们放入向量中。所以我想稍微升级一下我的答案。使用 std::vector<:pair>,int> m;
这是一个很长的令人困惑的容器，我会尝试一下