使用并行 std::for_each 后的线程清理器警告答案

【问题标题】：Thread sanitizer warnings after using parallel std::for_each使用并行 std::for_each 后的线程清理器警告
【发布时间】：2022-01-10 18:22:08
【问题描述】：

我创建了以下简单的测试程序来查看并行执行如何与 std::for_each 一起工作。

#include <iostream>
#include <vector>
#include <execution>

int main(int ac, char**av){
    constexpr int size=5;
    std::vector<int> v;
    std::vector<int> expected;
    for(int i=0; i<size; ++i) v.push_back(i);
    expected.resize(size);

    std::for_each(std::execution::par, v.begin(), v.end(), [&](auto x){ expected[x]=x; });
    auto eq = std::equal(v.begin(), v.end(), expected.begin());
    std::cout << "Compare: "<<eq<<"\n";
    return 0;
}

该程序运行没有任何问题，但是如果我将它与经过清理的线程链接，我会收到数据竞争警告。这是程序输出：

Compare: 1
==================
WARNING: ThreadSanitizer: data race (pid=47090)
  Write of size 8 at 0x7fab5399b200 by thread T8:
    #0 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 (libtsan.so.10+0x35bc5)
    #1 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:760 (libtsan.so.10+0x35bc5)
    #2 rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long) ../../src/tbbmalloc/frontend.cpp:888 (libtbbmalloc.so.2+0x13700)

  Previous write of size 8 at 0x7fab5399b200 by thread T10:
    #0 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 (libtsan.so.10+0x35bc5)
    #1 memset /tp_src/gcc-9.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:760 (libtsan.so.10+0x35bc5)
    #2 rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long) ../../src/tbbmalloc/frontend.cpp:888 (libtbbmalloc.so.2+0x13700)

  Thread T8 (tid=47099, running) created by thread T4 at:
    #0 pthread_create /tp_src/gcc-9.2.0/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.10+0x3057b)
    #1 rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) ../../src/tbb/../rml/server/thread_monitor.h:218 (libtbb.so.2+0x20ab8)
    #2 tbb::internal::rml::private_worker::wake_or_launch() ../../src/tbb/private_server.cpp:297 (libtbb.so.2+0x20ab8)
    #3 tbb::internal::rml::private_server::wake_some(int) ../../src/tbb/private_server.cpp:395 (libtbb.so.2+0x20ab8)

  Thread T10 (tid=47101, running) created by thread T3 at:
    #0 pthread_create /tp_src/gcc-9.2.0/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.10+0x3057b)
    #1 rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) ../../src/tbb/../rml/server/thread_monitor.h:218 (libtbb.so.2+0x20ab8)
    #2 tbb::internal::rml::private_worker::wake_or_launch() ../../src/tbb/private_server.cpp:297 (libtbb.so.2+0x20ab8)
    #3 tbb::internal::rml::private_server::wake_some(int) ../../src/tbb/private_server.cpp:395 (libtbb.so.2+0x20ab8)

SUMMARY: ThreadSanitizer: data race ../../src/tbbmalloc/frontend.cpp:888 in rml::internal::BootStrapBlocks::allocate(rml::internal::MemoryPool*, unsigned long)       [83/16871]
==================

看起来for_each已经完成，最后比较成功了。然而，一些后台线程使线程清理器在 main 完成期间不开心。

此示例是否有任何问题，或者这是线程清理程序中的错误或误报警告，我可以忽略它？

这是我的编译方式：

g++ -O3 -I$TBBINC -std=c++17 -fsanitize=thread  ForEach.cpp $TBBLIB/libtbb.so -o ForEach

【问题讨论】：

我不认为你可以修改 UnaryFunction 中的其他元素，除了你给定的元素。
我不认为这是问题所在。此外，即使我将此 lambda 设为空，我仍然会收到相同的警告。
在这种情况下，将 lambda 的主体设置为空在您的 sn-p 中可能是有意义的。另外，如果 for_each 中的 lambda 没有通过引用捕获，你还会收到警告吗？
是的，如果 lambda 为空并且没有捕获任何内容，我仍然会收到警告。
我问了一个关于for_each与周围代码同步的新问题：stackoverflow.com/questions/70278724/…

标签： c++ multithreading c++17 tbb

【解决方案1】：

TL;DR：存在数据竞争。但是，错误消息对我来说毫无意义。

std::foreach 保证执行顺序。
std::vector 不是线程安全的。如果您尝试过，int& 的线程安全性再低。
没有互斥锁。没有原子。因此，不能保证顺序一致性。

相对于 M 的值计算 B，标量对象或位域 M 上的可见副作用 A 满足条件：
— A 发生在 B 之前并且
— X 对 M 没有其他副作用，即 A 发生在 X 之前，X 发生在 B 之前。
由评估 B 确定的非原子标量对象或位域 M 的值应为由可见的副作用 A 存储。
[注意：如果对非原子对象的哪个副作用有歧义或位域可见，则行为未指定或未定义。 — 结束注 ]
[注：这说明对普通对象的操作不会明显地重新排序。没有数据，这实际上是无法检测到的竞争，但有必要确保如下定义的数据竞争，并且对数据竞争有适当的限制原子的使用，对应于简单交错（顺序一致）执行中的数据竞争。 - 结尾注意]

见：[intro.multithread]

另外：您有三个非原子加载，两个非原子存储。
在第一次存储之前发生两次加载。 x86-64 上的这两个负载可能会重新排列。它们都获取相同的非原子、未锁定内存段。

谢天谢地，如果您的算法是具有二次时间复杂度的 std::iota 的实现，您的算法才会定义行为。

int main() {
  constexpr auto kSize = std::size_t(5);
  const auto expected = [] (std::size_t length) {
    auto self = std::vector<int>(length);
    std::iota(self.begin(), self.end(), 0);
    return self;
  } (kSize);
  // Now the raw loop… (Can't be bothered to use a lambda this time)
  auto actual = std::vector<int>(kSize);
  for (auto i = std::size_t(0); i != kSize; ++i)
    actual[i] = i; // No UB. Single thread. 

  std::cout << "Compare: " << (actual == expected);
  std::endl(std::cout);
}

如果kSize 的值不能完全由int 表示，则行为不是我的问题。
注意：未经测试的代码。如果不起作用：替换为std::puts("Compare: 1");

【讨论】：

我不确定我是否理解您所看到的种族。您是否认为 lambda 中对 v 和 expected 的访问与 for_each 之前和之后的主线程中的访问竞争？我当然没有看到工作线程本身之间发生任何数据竞争，因为它们都存储到不同的元素中。
同意前面的评论。由于不同的线程访问向量的不同元素，即没有内存争用。此外，正如我在之前的 cmets 中提到的，即使 lambda 完全为空（没有主体，没有捕获），我仍然会收到相同的警告。
我也同意v 不是这里的问题。问题是没有被锁定但被修改的向量。 expected。在分配给它的一个元素之前，它应该被锁定。我不确定行为是否已定义/指定。虽然x 无法更改，但expected[x] 可以更改。我不认为这是确定的顺序。值计算是先排序的，而不是副作用。我同意：没有竞争条件，只有数据竞争，因为相同的、非原子的、未锁定的对象被多个线程修改。见：en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t