C++ - 多线程需要更长的线程答案

【问题标题】：C++ - Multithreading takes longer with more threadsC++ - 多线程需要更长的线程
【发布时间】：2019-04-29 21:34:47
【问题描述】：

我正在为一项作业制作并行密码破解程序。当我启动多个线程时，我添加的线程越多，破解所需的时间就越长。这里有什么问题？

其次，我还可以使用哪些资源共享技术来获得最佳性能？我需要使用互斥锁、原子操作或屏障，同时还需要使用信号量、条件变量或通道。互斥锁似乎大大减慢了我的程序。

这是我的上下文代码示例：

std::mutex mtx;
std::condition_variable cv;

void run()
{
  std::unique_lock<std::mutex> lck(mtx);
  ready = true;
  cv.notify_all();
}

crack()
{
  std::lock_guard<std::mutex> lk(mtx);
  ...do cracking stuff
}

main()
{
  ....

  std::thread *t = new std::thread[uiThreadCount];

  for(int i = 0; i < uiThreadCount; i++)
  {
    t[i] = std::thread(crack, params);
  }

  run();

  for(int i = 0; i < uiThreadCount; i++)
  {
    t[i].join();
  }

}

【问题讨论】：

您的代码中没有并行发生的事情。所有线程一开始就锁定同一个互斥体。
另请注意，创建和管理线程会带来一些开销。拥有更多它们并不会自动加快您的计算速度。
我很确定代码不会编译。
完全不相关，但可能会为您节省一些未来的麻烦：根据需要将 std::thread *t = new std::thread[uiThreadCount]; 替换为 std::vector<std::thread> 和 emplace_back 线程。
考虑5 Big Fat Reasons Why Mutexes Suck Big Time（第2条）

标签： c++ multithreading mutex

【解决方案1】：

在编写多线程代码时，共享尽可能少的资源通常是个好主意，这样您就可以避免使用mutex 或atomic 进行同步。

破解密码的方法有很多种，所以我举一个稍微简单一点的例子。假设您有一个散列函数和一个散列，并且您试图猜测是什么输入产生了散列（这基本上就是密码被破解的方式）。

我们可以这样写饼干。它将获取散列函数和密码散列，检查一系列值，如果找到匹配项，则调用回调函数。

auto cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, auto callback) {
    for(auto i = min; i < max; i++) {
        auto output = hashFunc(i); 
        if(output == passwdHash) {
             callback(i);
        }
    }
};

现在，我们可以编写一个并行版本。此版本只有在找到匹配项时才需要同步，这非常罕见。

auto parallel_cracker = [](auto passwdHash, auto hashFunc, auto min, auto max, int num_threads) {
    // Get a vector of threads
    std::vector<std::thread> threads;
    threads.reserve(num_threads);

    // Make a vector of all the matches it discovered
    using input_t = decltype(min); 
    std::vector<input_t> matches; 
    std::mutex match_lock;

    // Whenever a match is found, this function gets called
    auto callback = [&](input_t match) {
        std::unique_lock<std::mutex> _lock(match_lock); 
        std::cout << "Found match: " << match << '\n';
        matches.push_back(match); 
    };

    for(int i = 0; i < num_threads; i++) {
        auto sub_min = min + ((max - min) * i) / num_threads;
        auto sub_max = min + ((max - min) * (i + 1)) / num_threads;
        matches.push_back(std::thread(cracker, passwdHash, hashFunc, sub_min, sub_max, callback)); 
    }

    // Join all the threads
    for(auto& thread : threads) {
        thread.join(); 
    }
    return matches; 
};

【讨论】：

【解决方案2】：

是的，它的编写方式并不令人惊讶：在线程的开头放置一个互斥锁（crack 函数），您可以有效地使它们按顺序运行

我了解您想要实现线程的“同步启动”（通过使用条件变量 cv 的意图），但您没有正确使用它 - 不使用它的 wait 方法之一，调用 cv.notify_all() 是没有用的：它不会按照您的意图执行，而是您的线程将简单地按顺序运行。

在您的crack() 调用中使用std::condition_variable 中的wait() 势在必行：它将释放mtx（您刚刚使用互斥锁保护器lk 抓住了它）并将阻止线程的执行，直到cv.notify_all()。调用后，您的其他线程（第一个线程除外，无论哪个线程）将保留在mtx 下，因此如果您真的想要“并行”执行，则需要解锁mtx。

在这里，您的crack 线程应该是这样的：

crack()
{
  std::unique_lock<std::mutex> lk(mtx);
  cv.wait(lk);
  lk.unlock();

  ...do cracking stuff

}

顺便说一句，您的 run() 调用中不需要 ready 标志 - 它完全是多余的/未使用的。

我需要使用互斥锁、原子操作或屏障同时还使用信号量、条件变量或通道

- 不同的工具/技术对不同的事情有好处，这个问题太笼统了

【讨论】：