在 Fortran 中 MPI 分区和收集二维数组答案

【问题标题】：MPI partition and gather 2D array in Fortran在 Fortran 中 MPI 分区和收集二维数组
【发布时间】：2014-08-01 10:09:44
【问题描述】：

我有一个二维数组，我在其中对每个进程运行一些计算。之后，我需要将所有计算列收集回根进程。我目前以先到先得的方式进行分区。在伪代码中，主循环如下所示：

DO i = mpi_rank + 1, num_columns, mpi_size
   array(:,i) = do work here

完成后，我需要在根进程中将这些列收集到正确的索引中。做这个的最好方式是什么？如果分区方案不同，看起来 MPI_GATHERV 可以做我想做的事。但是，我不确定最好的分区方法是什么，因为num_columns 和mpi_size 不一定能整除。

【问题讨论】：

标签： fortran mpi

【解决方案1】：

我建议以下方法：

将二维数组切割成“几乎相等”大小的块，即本地列数接近num_columns/mpi_size。
使用mpi_gatherv 收集块，它使用不同大小的块。

要获得“几乎相等”的列数，请将本地列数设置为 num_columns / mpi_size 的整数值，并且仅对第一个 mod(num_columns,mpi_size) mpi 任务递增一。

下表演示了 (10,12) 矩阵在 5 个 MPI 进程上的划分：

  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42
  01  02  03  11  12  13  21  22  31  32  41  42

这里第一个数字是进程的一个id，第二个数字是本地列数。如您所见，进程 0 和 1 各有 3 列，而所有其他进程各只有 2 列。

您可以在下面找到我编写的工作示例代码。最棘手的部分是为 MPI_Gatherv 生成 rcounts 和 displs 数组。讨论的表格是代码的输出。

  program mpi2d
  implicit none
  include 'mpif.h'
  integer myid, nprocs, ierr
  integer,parameter:: m = 10       ! global number of rows
  integer,parameter:: n = 12       ! global number of columns
  integer nloc                     ! local  number of columns
  integer array(m,n)               ! global m-by-n, i.e. m rows and n columns
  integer,allocatable:: loc(:,:)   ! local piece of global 2d array
  integer,allocatable:: rcounts(:) ! array of nloc's (for mpi_gatrherv)
  integer,allocatable:: displs(:)  ! array of displacements (for mpi_gatherv)
  integer i,j


  ! Initialize
  call mpi_init(ierr)
  call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
  call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)

  ! Partition, i.e. get local number of columns
  nloc = n / nprocs
  if (mod(n,nprocs)>myid) nloc = nloc + 1

  ! Compute partitioned array
  allocate(loc(m,nloc))
  do j=1,nloc
    loc(:,j) = myid*10 + j
  enddo

  ! Build arrays for mpi_gatherv:
  ! rcounts containes all nloc's
  ! displs  containes displacements of partitions in terms of columns
  allocate(rcounts(nprocs),displs(nprocs))
  displs(1) = 0
  do j=1,nprocs
    rcounts(j) = n / nprocs
    if(mod(n,nprocs).gt.(j-1)) rcounts(j)=rcounts(j)+1
    if((j-1).ne.0)displs(j) = displs(j-1) + rcounts(j-1)
  enddo

  ! Convert from number of columns to number of integers
  nloc    = m * nloc
  rcounts = m * rcounts
  displs  = m * displs

  ! Gather array on root
  call mpi_gatherv(loc,nloc,MPI_INT,array,
 &  rcounts,displs,MPI_INT,0,MPI_COMM_WORLD,ierr)

  ! Print array on root
  if(myid==0)then
    do i=1,m
      do j=1,n
        write(*,'(I04.2)',advance='no') array(i,j)
      enddo
      write(*,*)
    enddo
  endif

  ! Finish
  call mpi_finalize(ierr)

  end

【讨论】：

【解决方案2】：

聚集成mpi_size 大小的块怎么样？

为了缩短这里的时间，我假设num_columns 是mpi_size 的倍数。在您的情况下，聚会应该类似于（lda 是array 的第一个维度）：

DO i = 1, num_columns/mpi_size
  IF (rank == 0) THEN
    CALL MPI_GATHER(MPI_IN_PLACE, lda, [TYPE], array(1,(i-1)*mpi_size+1), lda, [TYPE], 0, MPI_COMM_WORLD, ierr)
  ELSE
    CALL MPI_GATHER(array(1, rank + (i-1)*mpi_size + 1), lda, [TYPE], array(1,(i-1)*mpi_size+1), lda, [TYPE], 0, MPI_COMM_WORLD, ierr)
  END IF
ENDDO

我不太确定索引是否真的有效，但我认为你应该明白这一点。

【讨论】：