是否可以创建一组线程，然后只在以后“使用”线程？答案

【问题标题】：Is it possible to create a team of threads, and then only "use" the threads later?是否可以创建一组线程，然后只在以后“使用”线程？
【发布时间】：2016-12-16 15:51:16
【问题描述】：

所以我有一些 OpenMP 代码：

for(unsigned int it = 0; it < its; ++it)
{
    #pragma omp parallel
    {
        /**
         * Run the position integrator, reset the
         * acceleration, update the acceleration, update the velocity.
         */

          #pragma omp for schedule(dynamic, blockSize)
          for(unsigned int i = 0; i < numBods; ++i)
          {
              Body* body = &bodies[i];
              body->position += (body->velocity * timestep);
              body->position += (0.5 * body->acceleration * timestep * timestep);

              /**
               * Update velocity for half-timestep, then reset the acceleration.
               */
              body->velocity += (0.5f) * body->acceleration * timestep;
              body->acceleration = Vector3();
          }

          /**
           * Calculate the acceleration.
           */
          #pragma omp for schedule(dynamic, blockSize)
          for(unsigned int i = 0; i < numBods; ++i)
          {
              for(unsigned int j = 0; j < numBods; ++j)
              {
                  if(j > i)
                  {
                      Body* body = &bodies[i];
                      Body* bodyJ = &bodies[j];

                    /**
                     * Calculating some of the subsections of the acceleration formula.
                     */
                    Vector3 rij = bodyJ->position - body->position;
                    double sqrDistWithEps = rij.SqrMagnitude() + epsilon2;
                    double oneOverDistCubed = 1.0 / sqrt(sqrDistWithEps * sqrDistWithEps * sqrDistWithEps);
                    double scalar = oneOverDistCubed * gravConst;

                    body->acceleration += bodyJ->mass * scalar * rij;
                    bodyJ->acceleration -= body->mass * scalar * rij; //Newton's Third Law.
                }
            }
        }

        /**
         * Velocity for the full timestep.
         */
        #pragma omp for schedule(dynamic, blockSize)
        for(unsigned int i = 0; i < numBods; ++i)
        {
            bodies[i].velocity += (0.5 * bodies[i].acceleration * timestep);
        }
    }

    /**
     * Don't want I/O to be parallel
     */
    for(unsigned int index = 1; index < bodies.size(); ++index)
    {
        outFile << bodies[index] << std::endl;
    }
}

这很好，但我不禁认为在每次迭代中分叉一组线程是一个坏主意。但是，迭代必须按顺序进行；所以我不能让迭代本身是并行的。

我只是想知道是否有办法设置它以在每次迭代中重用同一组线程？

【问题讨论】：

标签： c++ multithreading parallel-processing openmp

【解决方案1】：

据我所知，这是最合乎逻辑的方法，线程池已经创建，每次线程到达并行构造函数时，它都会从池中请求一组线程。因此，它不会在每次到达并行区域构造函数时都创建一个线程池，但是如果您想重用相同的线程，为什么不直接将并行构造函数推出循环并使用 single pragma 处理顺序代码，类似这样：

#pragma omp parallel
{
    for(unsigned int it = 0; it < its; ++it)
    {
       ...

          ...

        /**
        * Don't want I/O to be parallel
        */

       #pragma omp single
       {
           for(unsigned int index = 1; index < bodies.size(); ++index)
           {
               outFile << bodies[index] << std::endl;
           }
       } // threads will wait in the internal barrier of the single 
   }
}

我进行了快速搜索，此答案的第一段可能取决于您使用的 OpenMP 实现，我强烈建议您阅读所使用的手册。

表单示例，来自source:

OpenMP* 严格来说是一个 fork/join 线程模型。在一些 OpenMP 实现，线程是在并行区域的开始处创建的并在平行区域的末端销毁。 OpenMP 应用程序通常有几个并行区域，中间有串行地区。 为每个并行区域创建和销毁线程可以导致显着的系统开销，特别是如果并行区域位于循环内； 因此，英特尔 OpenMP 实现使用线程池。首先创建一个工作线程池平行区。这些线程在程序期间存在执行。如果请求，可能会自动添加更多线程程序。直到最后一个并行区域，线程才会被销毁被执行。

不过，如果您将并行区域放在循环之外，则不必担心上述段落中提到的潜在开销。

【讨论】：

【解决方案2】：

OpenMP 模型通常显示为 fork-join 范例。但出于性能原因，线程在连接结束时不会被杀死。在某些实现中，例如 Intel OpenMP，线程会在连接结束时等待自旋锁，然后再休眠一段时间（请参阅 https://software.intel.com/en-us/node/522775 上的 KMP_BLOCKTIME）。

【讨论】：