二维数组和 malloc 的 MPI_Scatter答案

【问题标题】：MPI_Scatter of 2D array and malloc二维数组和 malloc 的 MPI_Scatter
【发布时间】：2013-11-30 15:03:03
【问题描述】：

我正在尝试使用 MPI 库用 C 语言编写一个程序，其中主进程创建一个二维数组并将其行分配给其他进程。该矩阵有维度p*p，其中p是进程数。

代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int **createMatrix (int nrows, int ncols) {
    int **matrix;
    int h, i, j;

    if (( matrix = malloc(nrows*sizeof(int*))) == NULL) {
        printf("Malloc error");
        exit(1);
    }

    for (h=0; h<nrows; h++) {
        if (( matrix[h] = malloc( ncols * sizeof(int))) == NULL) {
            printf("Malloc error 2");
            exit(1);
        }
    }

    for (i=0; i<ncols; i++) {
        for (j=0; j<nrows; j++) {
            matrix[i][j] = ((i*nrows) + j);
        }
    }

    return matrix;
}

void printArray (int *row, int nElements) {
    int i;
    for (i=0; i<nElements; i++) {
        printf("%d ", row[i]);
    }
    printf("\n");
}

void printMatrix (int **matrix, int nrows, int ncols) {
    int i;
    for (i=0; i<nrows; i++) {
        printArray(matrix[i], ncols);
    }
}

int main (int argc, char **argv) {

    if (MPI_Init(&argc, &argv) != MPI_SUCCESS) {
        perror("Error initializing MPI");
        exit(1);
    }

    int p, id;
    MPI_Comm_size(MPI_COMM_WORLD, &p); // Get number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &id); // Get own ID

    int **matrix;

    if (id == 0) {
        matrix = createMatrix(p, p); // Master process creates matrix
        printf("Initial matrix:\n");
        printMatrix(matrix, p, p);
    }

    int *procRow = malloc(sizeof(int) * p); // received row will contain p integers
    if (procRow == NULL) {
        perror("Error in malloc 3");
        exit(1);
    }

    if (MPI_Scatter(*matrix, p, MPI_INT, // send one row, which contains p integers
                    procRow, p, MPI_INT, // receive one row, which contains p integers
                    0, MPI_COMM_WORLD) != MPI_SUCCESS) {

        perror("Scatter error");
        exit(1);
    }

    printf("Process %d received elements: ", id);
    printArray(procRow, p);

    MPI_Finalize();

    return 0;
}

运行此代码时我收到的输出是

$ mpirun -np 4 test
Initial matrix:
0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 
Process 0 received elements: 0 1 2 3 
Process 1 received elements: 1 50 32 97 
Process 2 received elements: -1217693696 1 -1217684120 156314784 
Process 3 received elements: 1 7172196 0 0

进程 0 显然收到了正确的输入，但其他进程显示的数字我无法理解。另请注意，进程 1 和 3 的数量在程序的多次运行中是一致的，而进程 2 的数量在每次运行时都会发生变化。

在我看来，我的内存分配或指针使用有问题，但我对 C 编程很陌生。谁能向我解释一下如何以及为什么会产生这个输出？其次，显然，我也对如何解决我的问题感兴趣:)提前谢谢！

【问题讨论】：

见Correctly allocating multi-dimensional arrays。

标签： c arrays malloc mpi

【解决方案1】：

我认为您从根本上误解了分散操作的作用以及 MPI 期望如何分配和使用内存。

MPI_Scatter 获取源数组并将其拆分为多个片段，向 MPI 通信器的每个成员发送一个唯一片段。在您的示例中，您需要矩阵分配线性内存中的连续 p*p 元素，这会将 p 值发送到每个进程。您的源“矩阵”是一个指针数组。不能保证行在内存中是按顺序排列的，MPI_Scatter 不知道如何遍历你传递给它的指针数组。结果，调用只是读取超出您通过矩阵指针间接传递的第一行的末尾，将内存中随后的任何内容视为数据。这就是为什么您在接收第一行之后的数据的进程中得到垃圾值的原因。

所有 MPI 数据复制例程都期望源内存和目标内存是“平面”线性数组。多维 C 数组应存储在 row major order 中，而不是像您在此处所做的那样存储在指针数组中。说明分散调用正常工作的示例的廉价和讨厌的黑客将是这样的：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int *createMatrix (int nrows, int ncols) {
    int *matrix;
    int h, i, j;

    if (( matrix = malloc(nrows*ncols*sizeof(int))) == NULL) {
        printf("Malloc error");
        exit(1);
    }

    for (h=0; h<nrows*ncols; h++) {
        matrix[h] = h+1;
    }

    return matrix;
}

void printArray (int *row, int nElements) {
    int i;
    for (i=0; i<nElements; i++) {
        printf("%d ", row[i]);
    }
    printf("\n");
}

int main (int argc, char **argv) {

    if (MPI_Init(&argc, &argv) != MPI_SUCCESS) {
        perror("Error initializing MPI");
        exit(1);
    }

    int p, id;
    MPI_Comm_size(MPI_COMM_WORLD, &p); // Get number of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &id); // Get own ID

    int *matrix;

    if (id == 0) {
        matrix = createMatrix(p, p); // Master process creates matrix
        printf("Initial matrix:\n");
        printArray(matrix, p*p);
    }

    int *procRow = malloc(sizeof(int) * p); // received row will contain p integers
    if (procRow == NULL) {
        perror("Error in malloc 3");
        exit(1);
    }

    if (MPI_Scatter(matrix, p, MPI_INT, // send one row, which contains p integers
                procRow, p, MPI_INT, // receive one row, which contains p integers
                0, MPI_COMM_WORLD) != MPI_SUCCESS) {

        perror("Scatter error");
        exit(1);
    }

    printf("Process %d received elements: ", id);
    printArray(procRow, p);

    MPI_Finalize();

    return 0;
}

这是做什么的：

$ mpicc -o scatter scatter.c 
$ mpiexec -np 4 scatter
Initial matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
Process 0 received elements: 1 2 3 4 
Process 1 received elements: 5 6 7 8 
Process 2 received elements: 9 10 11 12 
Process 3 received elements: 13 14 15 16

即。当您传递存储在线性内存中的数据时，它会起作用。等效的主行数组将像这样静态分配：

int matrix[4][4] = { {  1,  2,  3,  4 }, 
                     {  5,  6,  7,  8 },
                     {  9, 10, 11, 12 },
                     { 13, 14, 15, 16 } };

请注意静态分配的二维数组和代码动态分配的指针数组之间的区别。尽管它们看起来很相似，但它们根本不是一回事。

【讨论】：

感谢您的回复，但二维矩阵不就是简单的“数组数组”吗？ IE。不是 3x3 的矩阵只是一个由 3 个（形式的元素）数组组成的数组，它们都包含 3 个元素吗？按照这个逻辑，我认为 MPI_Scatter 也应该能够分散我的矩阵中的子数组（即行），但也许这是我推理的错误。另外，您能否对我的程序产生的输出有所了解？或者这是不正确使用 MPI_Scatter 的结果？
Scatter 期望它分散的数组存储在单个连续的内存分配中。您的“矩阵”实际上是一个指针数组，每一行分配都与其他行无关。无法保证它们在记忆中彼此跟随。您的代码中的问题是您通过指针间接传递第一行，但随后 scatter 继续读取超出第一行的分配，将之后存储的任何内容视为数据。它对内存中指针数组的内部布局一无所知，所以它散布在第一行末尾的东西只是垃圾。
@rvw：我已经编辑了答案，试图让它更清楚一点。当我最初查看您的代码时，我假设您认为 scatter 的工作方式更像发送，并且您只想分散第一行，但现在我怀疑这是对数组以及它们如何在 C 中存储在内存中的误解。我希望它对你更有意义。赞成/接受将不胜感激....
非常感谢，您的第二个答案就是我要找的答案！