在 MPI 中按元素求和和收集数组元素答案

【问题标题】：Summing and Gathering elements of array element-wise in MPI在 MPI 中按元素求和和收集数组元素
【发布时间】：2017-04-03 01:45:08
【问题描述】：

在使用笛卡尔拓扑计算将矩阵与向量相乘之后。我得到了他们的等级和向量的以下过程。

P0 (process with rank = 0) =[2 , 9].
P1 (process with rank = 1) =[2 , 3]
P2 (process with rank = 2) =[1 , 9] 
P3 (process with rank = 3) =[4 , 6].

现在。我需要分别对偶数进程和奇数进程的元素求和，如下所示：

temp1 = [3 , 18]
temp2 = [6 , 9]

然后，将结果收集到不同的向量中，如下所示：

结果 = [3 , 18 , 6 , 9]

我的尝试是使用 MPI_Reduce，然后像这样使用 MPI_Gather：

// Previous code 
 double* temp1 , *temp2;
    if(myrank %2 == 0){
     BOOLEAN flag =  Allocate_vector(&temp1 ,local_m); // function to allocate space for vectors
     MPI_Reduce(local_y, temp1, local_n, MPI_DOUBLE, MPI_SUM, 0 ,  comm);
     MPI_Gather(temp1, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE,0, comm);
      free(temp1);
         }
  else{
      Allocate_vector(&temp2 ,local_m);
      MPI_Reduce(local_y, temp2, local_n , MPI_DOUBLE, MPI_SUM, 0 ,  comm);
      MPI_Gather(temp2, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0,comm);
      free(temp2);
         }

但答案不正确。似乎代码将偶数和奇数过程的所有元素相加，然后给出了分段错误错误：错误结果 = [21 15 0 0] 还有这个错误

** ./test': double free or corruption (fasttop): 0x00000000013c7510 *** *** Error in./test' 中的错误：双重释放或损坏（fasttop）：0x0000000001605b60 ***

【问题讨论】：

见How to create a Minimal, Complete, and Verifiable example

标签： c++ parallel-processing mpi

【解决方案1】：

它不会按照您尝试的方式工作。要对流程子集的元素执行归约，您必须为它们创建子通信器。在您的情况下，奇数和偶数进程共享相同的comm，因此操作不是针对两个单独的进程组，而是针对组合组。

您应该使用MPI_Comm_split 执行拆分，使用两个新的子通信器执行归约，最后让每个子通信器中的排名为 0（我们称这些 领导者）参与聚集而不是另一个仅包含这两个的子通信器：

// Make sure rank is set accordingly

MPI_Comm_rank(comm, &rank);

// Split even and odd ranks in separate subcommunicators

MPI_Comm subcomm;
MPI_Comm_split(comm, rank % 2, 0, &subcomm);

// Perform the reduction in each separate group

double *temp;
Allocate_vector(&temp, local_n);
MPI_Reduce(local_y, temp, local_n , MPI_DOUBLE, MPI_SUM, 0, subcomm);

// Find out our rank in subcomm

int subrank;
MPI_Comm_rank(subcomm, &subrank);

// At this point, we no longer need subcomm. Free it and reuse the variable.

MPI_Comm_free(&subcomm);

// Separate both group leaders (rank 0) into their own subcommunicator

MPI_Comm_split(comm, subrank == 0 ? 0 : MPI_UNDEFINED, 0, &subcomm);
if (subcomm != MPI_COMM_NULL) {
  MPI_Gather(temp, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0, subcomm);
  MPI_Comm_free(&subcomm);
}

// Free resources

free(temp);

结果将在后者subcomm 中排名为0 的gResult 中，由于拆分的执行方式，它恰好是comm 中的排名0。

我猜没有想象的那么简单，但这就是在 MPI 中方便的集体操作的代价。

在侧节点上，在显示的代码中，您将分配temp1 和temp2 的长度为local_m，而在所有集合调用中，长度指定为local_n。如果发生local_n > local_m，则会发生堆损坏。

【讨论】：

谢谢。我采纳了您的建议。代码首先成功运行，然后当我再次尝试运行它时仍然给出答案，但出现此错误： PMPI_Comm_free 中的致命错误：PMPI_Comm_free 中的致命错误：无效的通信器，错误堆栈：PMPI_Comm_free(143): MPI_Comm_free(comm=0x7ffe22d63200)失败 PMPI_Comm_free(93)。：空通信器 PMPI_Comm_free 中的致命错误：无效通信器，错误堆栈：PMPI_Comm_free(143)：MPI_Comm_free(comm=0x7ffca17bd180) 失败 PMPI_Comm_free(93)。：空通信器
似乎无法创建第一行的通信器，因为它返回MPI_COMM_NULL
啊，当然。我的错。进程应该在最后尝试释放它之前检查它们是否是subcomm 的一部分。固定。
如果你打算多次调用这个序列，那么实际上保留两个子通信器并只在最后释放它们是有意义的。