MPI 将矩阵划分为块答案

【问题标题】：MPI partition matrix into blocksMPI 将矩阵划分为块
【发布时间】：2011-11-24 20:18:39
【问题描述】：

我想将矩阵划分为块（不是条带），然后使用 MPI_Scatter 分配这些块。

我想出了可行的解决方案，但我认为它远非“最佳实践”。我有 8x8 矩阵，填充了从 0 到 63 的数字。然后我使用 MPI_Type_vector 将它分成 4 个 4x4 块并通过 MPI_Send 分配它，但这需要一些额外的计算，因为我必须计算大矩阵中每个块的偏移量。

如果我使用分散，第一个（左上角）块传输正常，但其他块不是（块开始的偏移错误）。

那么是否可以使用 MPI_Scatter 传输矩阵块，或者进行所需分解的最佳方法是什么？

这是我的代码：

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define SIZE 8


int main(void) {

        MPI_Init(NULL, NULL);
        int p, rank;
        MPI_Comm_size(MPI_COMM_WORLD, &p);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        char i;

        char a[SIZE*SIZE];
        char b[(SIZE/2)*(SIZE/2)];

        MPI_Datatype columntype;
        MPI_Datatype columntype2;

        MPI_Type_vector(4, 4, SIZE, MPI_CHAR, &columntype2);
        MPI_Type_create_resized( columntype2, 0, sizeof(MPI_CHAR), &columntype );
        MPI_Type_commit(&columntype);

        if(rank == 0) {
                for( i = 0; i < SIZE*SIZE; i++) {
                        a[i] = i;
                }

                for(int rec=0; rec < p; rec++) {
                        int offset = (rec%2)*4 + (rec/2)*32;
                      MPI_Send (a+offset, 1, columntype, rec, 0, MPI_COMM_WORLD);
                }
        }
        MPI_Recv (b, 16, MPI_CHAR, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        //MPI_Scatter(&a, 1, boki, &b, 16, MPI_CHAR , 0, MPI_COMM_WORLD);

        printf("rank= %d  b= \n%d %d %d %d\n%d %d %d %d\n%d %d %d %d\n%d %d %d %d\n", rank, b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15]);

        MPI_Finalize();

        return 0;
}

【问题讨论】：

标签： c matrix mpi scatter

【解决方案1】：

您所拥有的几乎是“最佳实践”；在你习惯之前，它只是有点混乱。

不过有两件事：

首先，请注意这一点：sizeof(MPI_CHAR) 我假设是 4 个字节，而不是 1 个。MPI_CHAR 是一个（整数）常量，用于描述（对 MPI 库）一个字符。你可能想要sizeof(char)，或SIZE/2*sizeof(char)，或其他任何方便的东西。但是调整大小的基本思路是正确的。

其次，我认为您无法使用MPI_Scatterv，因为没有简单的方法可以使每个块之间的偏移量大小相同。也就是说，第一个块中的第一个元素在a[0]，第二个在a[SIZE/2]（大小跳跃/2），下一个在a[SIZE*(SIZE/2)]（(SIZE-1)*(SIZE/2) 跳跃）。所以你需要能够手动生成偏移量。

以下似乎对我有用（当“大小”表示“行数”与“列数”等时，我对其进行了概括以使其更清楚）：

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define COLS  12
#define ROWS  8

int main(int argc, char **argv) {

    MPI_Init(&argc, &argv);
    int p, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &p);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    char i;

    char a[ROWS*COLS];
    const int NPROWS=2;  /* number of rows in _decomposition_ */
    const int NPCOLS=3;  /* number of cols in _decomposition_ */
    const int BLOCKROWS = ROWS/NPROWS;  /* number of rows in _block_ */
    const int BLOCKCOLS = COLS/NPCOLS; /* number of cols in _block_ */

    if (rank == 0) {
        for (int ii=0; ii<ROWS*COLS; ii++) {
            a[ii] = (char)ii;
        }
    }

    if (p != NPROWS*NPCOLS) {
        fprintf(stderr,"Error: number of PEs %d != %d x %d\n", p, NPROWS, NPCOLS);
        MPI_Finalize();
        exit(-1);
    }
    char b[BLOCKROWS*BLOCKCOLS];
    for (int ii=0; ii<BLOCKROWS*BLOCKCOLS; ii++) b[ii] = 0;

    MPI_Datatype blocktype;
    MPI_Datatype blocktype2;

    MPI_Type_vector(BLOCKROWS, BLOCKCOLS, COLS, MPI_CHAR, &blocktype2);
    MPI_Type_create_resized( blocktype2, 0, sizeof(char), &blocktype);
    MPI_Type_commit(&blocktype);

    int disps[NPROWS*NPCOLS];
    int counts[NPROWS*NPCOLS];
    for (int ii=0; ii<NPROWS; ii++) {
        for (int jj=0; jj<NPCOLS; jj++) {
            disps[ii*NPCOLS+jj] = ii*COLS*BLOCKROWS+jj*BLOCKCOLS;
            counts [ii*NPCOLS+jj] = 1;
        }
    }

    MPI_Scatterv(a, counts, disps, blocktype, b, BLOCKROWS*BLOCKCOLS, MPI_CHAR, 0, MPI_COMM_WORLD);
    /* each proc prints it's "b" out, in order */
    for (int proc=0; proc<p; proc++) {
        if (proc == rank) {
            printf("Rank = %d\n", rank);
            if (rank == 0) {
                printf("Global matrix: \n");
                for (int ii=0; ii<ROWS; ii++) {
                    for (int jj=0; jj<COLS; jj++) {
                        printf("%3d ",(int)a[ii*COLS+jj]);
                    }
                    printf("\n");
                }
            }
            printf("Local Matrix:\n");
            for (int ii=0; ii<BLOCKROWS; ii++) {
                for (int jj=0; jj<BLOCKCOLS; jj++) {
                    printf("%3d ",(int)b[ii*BLOCKCOLS+jj]);
                }
                printf("\n");
            }
            printf("\n");
        }
        MPI_Barrier(MPI_COMM_WORLD);
    }

    MPI_Finalize();

    return 0;
}

跑步：

$ mpirun -np 6 ./matrix

Rank = 0
Global matrix: 
  0   1   2   3   4   5   6   7   8   9  10  11 
 12  13  14  15  16  17  18  19  20  21  22  23 
 24  25  26  27  28  29  30  31  32  33  34  35 
 36  37  38  39  40  41  42  43  44  45  46  47 
 48  49  50  51  52  53  54  55  56  57  58  59 
 60  61  62  63  64  65  66  67  68  69  70  71 
 72  73  74  75  76  77  78  79  80  81  82  83 
 84  85  86  87  88  89  90  91  92  93  94  95 
Local Matrix:
  0   1   2   3 
 12  13  14  15 
 24  25  26  27 
 36  37  38  39 

Rank = 1
Local Matrix:
  4   5   6   7 
 16  17  18  19 
 28  29  30  31 
 40  41  42  43 

Rank = 2
Local Matrix:
  8   9  10  11 
 20  21  22  23 
 32  33  34  35 
 44  45  46  47 

Rank = 3
Local Matrix:
 48  49  50  51 
 60  61  62  63 
 72  73  74  75 
 84  85  86  87 

Rank = 4
Local Matrix:
 52  53  54  55 
 64  65  66  67 
 76  77  78  79 
 88  89  90  91 

Rank = 5
Local Matrix:
 56  57  58  59 
 68  69  70  71 
 80  81  82  83 
 92  93  94  95

【讨论】：

对不起，我想说的是行或列不能被 nprows 和 npcols 整除
哦；所以这还不错。我只是没有在这个例子中包含它，因为它引入了很多簿记，这分散了我试图理解的关于 MPI_Scatterv 的主要观点。您将使用 MPI_Dims_create（例如）从 p 计算 nprows 和 npcols；然后你会从 blockcols 计算 blockrows 而不是定义它们。（这也意味着您必须动态分配这些本地数组，而不是静态声明它们）。如果大小没有被 nprows 和 npcols 平均分配，最简单的方法是，如果 rows/cols 中的最后一个 proc 占用了剩下的任何内容。
我认为你也应该释放 MPI 类型；否则，内存泄漏。
@JonathanDursi 你能在同一个比赛中回答我的[问题] (stackoverflow.com/questions/43737456/…)。我已经尝试了您的建议，但它不适合我。