【问题标题】:MPI Segmentation Fault On Multiple Nodes Only仅多个节点上的 MPI 分段错误
【发布时间】:2020-05-30 18:07:05
【问题描述】:

因此,我目前正在构建一个控制程序的基础,以便在多个树莓派上运行,该树莓派将使用每个树莓派上的所有可用内核。当我使用所有内核在其中一个节点上测试我的代码时,它工作正常,但是使用多个节点会给我一个分段错误。

我查看了过去提出的所有类似问题,但它们都存在仅在一个节点上破坏我的代码的问题。

完整代码:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdbool.h>
#include <time.h> 
int main(int argc, char *argv[])
{
        FILE *input;
        char batLine[86];   //may need to be made larger if bat commands get longer
        char sentbatch[86];
        int currentTask;
        int numTasks, rank, rc, i;
        MPI_Status stat;
        bool exitFlag = false;

        //mpi stuff
        MPI_Init(&argc,&argv);  //initilize mpi enviroment
        MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
        MPI_Comm_rank(MPI_COMM_WORLD,&rank);
        //printf("Number of tasks: %d \n", numTasks);
        //printf ("MPI task %d has started...\n", rank);
        if(argc != 2)
        {
            printf("Usage: batallocation *.bat");
            exit(1); //exit with 1 indicates a failure
        }
        //contains file name: argv[1]
        input = fopen(argv[1],"r");

        currentTask = 0;
        if (rank ==0)
        {
            while(1)
            {
                if(exitFlag)
                    break; //allows to break out of while and for when no more lines exist
                char command[89] = "./";
                for(i=0; i < 16; i++) //will need to be 16 for full testing
                {

                    //fgets needs to be character count of longest line + 2 or it fails
                    if(fgets(batLine,86,input) != NULL)
                    {
                        printf("preview:%s\n",batLine);
                        if(i==0)
                        {
                            strcat(command,batLine);
                            printf("rank0 gets: %s\n", command);
                            //system(command);
                        }
                        else
                        {
                            //MPI_Send(buffer,count,type,dest,tag,comm)
                            MPI_Send(batLine,85,MPI_CHAR,i,i,MPI_COMM_WORLD); 
                            printf("sent rank%d: %s\n",i,batLine);
                        }
                    }
                    else
                    {
                        exitFlag = true; //flag to break out of while loop
                        break;
                    }


                }   
                //need to recieve data from other nodes here
                //put the data together in proper order
                //and only after that can the next sets be sent out

            }
        }
        else
        {
            char command[89] = "./";
            //MPI_Recv(buffer,count,type,source,tag,comm,status)
            MPI_Recv(sentbatch,86,MPI_CHAR,0,rank,MPI_COMM_WORLD,&stat);
            //using rank as flag makes it so only the wanted rank gets sent the data
            strcat(command,sentbatch); //adds needed ./ before batch data
            printf("rank=%d recieved data:%s",rank,sentbatch);
            //system(command); //should run batch line
        }
        fclose(input);
        MPI_Finalize();
        return(0);
}

被传递的文件内容:


LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-010.flx spec-56321-GAC099N59V1_sp01-010.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-013.flx spec-56321-GAC099N59V1_sp01-013.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-015.flx spec-56321-GAC099N59V1_sp01-015.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-018.flx spec-56321-GAC099N59V1_sp01-018.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-022.flx spec-56321-GAC099N59V1_sp01-022.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-023.flx spec-56321-GAC099N59V1_sp01-023.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-024.flx spec-56321-GAC099N59V1_sp01-024.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-025.flx spec-56321-GAC099N59V1_sp01-025.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-028.flx spec-56321-GAC099N59V1_sp01-028.nor f
LAMOSTv108 spec-56321-GAC099N59V1_sp01-029.flx spec-56321-GAC099N59V1_sp01-029.nor f

您会注意到我还没有做一些将在最终版本中完成的事情,它们在 cmets 中以使故障排除更容易。主要是因为 LAMOST 代码不快,我不想等待它完成。

有效的命令提示符及其输出:

 $mpiexec -N 4 --host 10.0.0.3 -oversubscribe batTest2 shortpass2.bat
preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

rank0 gets: ./LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

sent rank1: LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

sent rank2: LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

sent rank3: LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

rank=1 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f
rank=3 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f
rank=2 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

Shortpass2 是同一个文件,但只有前 4 行。我的代码理论上应该适用于所有 16 行,但在修复当前问题后,我将使用完整文件对其进行测试。

在多个节点上运行命令和输出:

$mpiexec -N 4 --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 -oversubscribe batTest2 shortpass.bat

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

rank0 gets: ./LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

sent rank1: LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

rank=1 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f
sent rank2: LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

rank=2 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f
sent rank3: LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f

rank=3 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f
sent rank4: LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f

rank=4 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f
[node2:27622] *** Process received signal ***
[node2:27622] Signal: Segmentation fault (11)
[node2:27622] Signal code: Address not mapped (1)
[node2:27622] Failing at address: (nil)
[node2:27622] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
corrupted double-linked list
Aborted

有时它会在完全中止之前成功达到 5 级,并且会有多个实例出现相同的错误消息。此外,Open MPI 安装了多线程支持,所以这不是问题。这是我第一次使用 MPI,但这不是整个项目的第一部分,我已经对 MPI 进行了大量研究,甚至可以做到这一点。

我知道这不是由我的数组引起的,因为那时它也会在 node1 上中断。所有的 pi 都是相同的,因此阵列导致分段错误是没有意义的。 (虽然我承认在这个项目的不同部分工作时,我曾多次遇到过这个问题,因为我更习惯 Java 和 C# 处理数组的方式)

编辑:我检查了是否可以从其他节点之一跨 4 个内核运行它,并且工作正常并产生与在 node1 上相同的输出。因此,这确认它不是仅发生在其他节点上的阵列问题。 还添加了预览打印输出代码中缺少的一行。

Edit2: Per Gilles 建议:该代码也适用于在一个节点上运行 16 个任务。这是输出:

$ mpiexec -N 16 --host 10.0.0.3 -oversubscribe batTest4 shortpass.bat
preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

rank0 gets: ./LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

sent rank1: LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f

sent rank2: LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f
preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

sent rank3: LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f

sent rank4: LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f

sent rank5: LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-010.flx spec-56321-GAC099N59V1_sp01-010.nor f

sent rank6: LAMOSTv108 spec-56321-GAC099N59V1_sp01-010.flx spec-56321-GAC099N59V1_sp01-010.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-013.flx spec-56321-GAC099N59V1_sp01-013.nor f

sent rank7: LAMOSTv108 spec-56321-GAC099N59V1_sp01-013.flx spec-56321-GAC099N59V1_sp01-013.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-015.flx spec-56321-GAC099N59V1_sp01-015.nor f

sent rank8: LAMOSTv108 spec-56321-GAC099N59V1_sp01-015.flx spec-56321-GAC099N59V1_sp01-015.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-018.flx spec-56321-GAC099N59V1_sp01-018.nor f

sent rank9: LAMOSTv108 spec-56321-GAC099N59V1_sp01-018.flx spec-56321-GAC099N59V1_sp01-018.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-022.flx spec-56321-GAC099N59V1_sp01-022.nor f

sent rank10: LAMOSTv108 spec-56321-GAC099N59V1_sp01-022.flx spec-56321-GAC099N59V1_sp01-022.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-023.flx spec-56321-GAC099N59V1_sp01-023.nor f

sent rank11: LAMOSTv108 spec-56321-GAC099N59V1_sp01-023.flx spec-56321-GAC099N59V1_sp01-023.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-024.flx spec-56321-GAC099N59V1_sp01-024.nor f

sent rank12: LAMOSTv108 spec-56321-GAC099N59V1_sp01-024.flx spec-56321-GAC099N59V1_sp01-024.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-025.flx spec-56321-GAC099N59V1_sp01-025.nor f

sent rank13: LAMOSTv108 spec-56321-GAC099N59V1_sp01-025.flx spec-56321-GAC099N59V1_sp01-025.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-028.flx spec-56321-GAC099N59V1_sp01-028.nor f

sent rank14: LAMOSTv108 spec-56321-GAC099N59V1_sp01-028.flx spec-56321-GAC099N59V1_sp01-028.nor f

preview:LAMOSTv108 spec-56321-GAC099N59V1_sp01-029.flx spec-56321-GAC099N59V1_sp01-029.nor f

sent rank15: LAMOSTv108 spec-56321-GAC099N59V1_sp01-029.flx spec-56321-GAC099N59V1_sp01-029.nor f

rank=3 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-005.flx spec-56321-GAC099N59V1_sp01-005.nor f
rank=5 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-008.flx spec-56321-GAC099N59V1_sp01-008.nor f
rank=6 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-010.flx spec-56321-GAC099N59V1_sp01-010.nor f
rank=7 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-013.flx spec-56321-GAC099N59V1_sp01-013.nor f
rank=11 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-023.flx spec-56321-GAC099N59V1_sp01-023.nor f
rank=12 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-024.flx spec-56321-GAC099N59V1_sp01-024.nor f
rank=9 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-018.flx spec-56321-GAC099N59V1_sp01-018.nor f
rank=2 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-004.flx spec-56321-GAC099N59V1_sp01-004.nor f
rank=4 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-006.flx spec-56321-GAC099N59V1_sp01-006.nor f
rank=8 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-015.flx spec-56321-GAC099N59V1_sp01-015.nor f
rank=10 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-022.flx spec-56321-GAC099N59V1_sp01-022.nor f
rank=15 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-029.flx spec-56321-GAC099N59V1_sp01-029.nor f
rank=1 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-003.flx spec-56321-GAC099N59V1_sp01-003.nor f
rank=13 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-025.flx spec-56321-GAC099N59V1_sp01-025.nor f
rank=14 recieved data:LAMOSTv108 spec-56321-GAC099N59V1_sp01-028.flx spec-56321-GAC099N59V1_sp01-028.nor f

【问题讨论】:

  • 等级 4 及以上接收但从不发送任何东西。尝试在单个节点上运行 16 个任务,看看会发生什么(您需要 mpirun --oversubscribe ...
  • 当您使用mpiexec -N 4 ... 运行时,您如何获得诸如rank=4 received data ... 之类的输出?在这种情况下,排名范围从 0 到 3。
  • 似乎mpiexec -N 4 --oversubscribe ... 意味着每个节点有 4 个 MPI 任务(至少对于 Open MPI v2.0.x)。无论如何,for() MPI_Send() 循环永远不会发送到 4 级及以上(并且代码中没有 preview 这样的东西)所以输出显然与您发布的代码不匹配。
  • 吉尔斯是正确的。使用“-N”而不是 -n 允许它为每个节点执行 4 个任务。至于在一个节点上运行 16 个任务是否允许?每个 pi 只有 4 个内核。对于预览,我一定忘记将其添加到记事本上。它应该在 " "if (rank ==0)" 之前。这是一个确保 fgets 正常工作的测试。
  • Gilles,我尝试按照您在之前的 cmets 中的建议在一个节点上运行 16 个任务,并且效果很好。我会将输出编辑到我原来​​的问题中,以防它有任何用处

标签: c raspberry-pi mpi


【解决方案1】:

不确定这是否是问题,但肯定是一个问题:

您正在阅读并从batLine 发送 85 个字符:

char batLine[86];

//fgets needs to be character count of longest line + 2 or it fails
if(fgets(batLine,86,input) != NULL)
{
    // ...
    MPI_Send(batLine,85,MPI_CHAR,i,i,MPI_COMM_WORLD);
    // ...
}

鉴于 batLine[] 是 86 个元素,而 LAMOSTv108 spec-56321-GAC099N59V1_sp01-001.flx spec-56321-GAC099N59V1_sp01-001.nor f\n 是 85 个字符长,您发送的字符串不包括第 86 个数组元素中的 \0 终止符。

在接收方你有:

char sentbatch[86];

{
    char command[89] = "./";
    // ...
    MPI_Recv(sentbatch,86,MPI_CHAR,0,rank,MPI_COMM_WORLD,&stat);
    strcat(command,sentbatch);
    // ...
}

sentbatch 从未初始化,因此最初它包含垃圾。由于所有传入消息的长度为 85 个字符,因此第 86 个字符永远不会被覆盖,并且它会保留最初存在的任何垃圾。因此,如果这不是\0,那么strcat() 将继续从sentbatch 读取超过第85 个字符并附加到command 的垃圾。由于commandsentbatch 都在堆栈上,因此读取将继续进行,直到它在堆栈上的某处碰到0x00,此时写入超过command 的末尾将破坏其他局部变量甚至堆栈帧稍后会导致潜在的段错误,或者直到它到达堆栈的末尾,这肯定会导致段错误。它有时会起作用并且在某些等级中起作用纯粹是偶然的。

要么更改 MPI_Send 以发送 86 个字符,要么将 sentbatch 的第 86 个元素显式归零。或者,更好的是,使用 strncat(command, sentbatch, 85) 追加不超过 85 个字符或直接接收到 command 使用

MPI_Recv(&command[2],86,MPI_CHAR,0,rank,MPI_COMM_WORLD,&stat);

char command[89] = "./";\0 填充command[] 的剩余87 个元素,因此在这种情况下终结符没有问题。

【讨论】:

  • 文件中的每一行长度为 84 个字符,不包括换行符和回车符。所以一共有86个字符。我之前忘记了回车,当只有 85 个字符时,输出会跳过向任务 1、3 等发送任何内容。长度为 89 的命令可以正常工作,因为它有 3 个字符。 2可见,然后自动\ 0。话虽如此,我会尝试更改它以查看是否有帮助。
  • 更改数组大小和发送和接收以包含一个额外的元素不会以任何方式改变输出。对于所有文件,每行的长度是恒定的,因为该文件是由另一部分代码创建的。那部分代码不是我写的,所以不能在这里分享。无论如何,它如何创建该文件无论如何都不是这个问题的一部分。
【解决方案2】:

经过多次尝试搜索类似问题,我终于在代码中找到了问题的答案。它只是在许多不同的可能输入中搜索错误消息。

线条:

input = fopen(argv[1],"1");
fclose(input);

只需要在等级 0 内。这意味着让它在多个节点上运行的正确代码是:

//has file open and closed moved to hopefully work on multiple nodes
//now only occurs for task0 which is on node1
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdbool.h>
#include <time.h> 
int main(int argc, char *argv[])
{
        FILE *input;
        char batLine[86];   //may need to be made larger if bat commands get longer
        char sentbatch[86];
        int currentTask;
        int numTasks, rank, rc, i;
        MPI_Status stat;
        bool exitFlag = false;

        //mpi stuff
        MPI_Init(&argc,&argv);  //initilize mpi enviroment
        MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
        MPI_Comm_rank(MPI_COMM_WORLD,&rank);
        //printf("Number of tasks: %d \n", numTasks);
        //printf ("MPI task %d has started...\n", rank);
        if(argc != 2)
        {
            printf("Usage: batallocation *.bat");
            exit(1); //exit with 1 indicates a failure
        }
        if (rank ==0)
        {
            //contains file name: argv[1]
            input = fopen(argv[1],"r");
            while(1)
            {
                if(exitFlag)
                    break; //allows to break out of while and for when no more lines exist
                char command[89] = "./";
                for(i=0; i < 16; i++) //will need to be 16 for full testing
                {

                    //fgets needs to be character count of longest line + 2 or it fails
                    if(fgets(batLine,86,input) != NULL)
                    {
                        if(i==0)
                        {
                            strcat(command,batLine);
                            printf("rank0 gets: %s\n", command);
                            //system(command);
                        }
                        else
                        {
                            //MPI_Send(buffer,count,type,dest,tag,comm)
                            MPI_Send(batLine,85,MPI_CHAR,i,i,MPI_COMM_WORLD); 
                            printf("sent rank%d: %s\n",i,batLine);
                        }
                    }
                    else
                    {
                        exitFlag = true; //flag to break out of while loop
                        break;
                    }


                }   
                //need to recieve data from other nodes here
                //put the data together in proper order
                //and only after that can the next sets be sent out

            }
            fclose(input);
        }
        else
        {
            char command[89] = "./";
            //MPI_Recv(buffer,count,type,source,tag,comm,status)
            MPI_Recv(sentbatch,86,MPI_CHAR,0,rank,MPI_COMM_WORLD,&stat);
            //using rank as flag makes it so only the wanted rank gets sent the data
            strcat(command,sentbatch); //adds needed ./ before batch data
            printf("rank=%d recieved data:%s",rank,sentbatch);
            //system(command); //should run batch line
        }

        MPI_Finalize();
        return(0);
}

我不知道回答您自己的问题有多可接受,但我想确保如果有人遇到同样的问题,他们知道如何解决它。我知道我讨厌当我发现一个类似的问题只是看到提问者编辑了他们解决了这个问题而没有解释他们是如何解决它的。

【讨论】:

    猜你喜欢
    • 2016-06-08
    • 2015-01-26
    • 2014-05-07
    • 1970-01-01
    • 1970-01-01
    • 2017-06-25
    • 2019-06-26
    • 2014-07-06
    • 2011-12-24
    相关资源
    最近更新 更多