【发布时间】:2014-04-15 00:20:25
【问题描述】:
我正在使用 MPI 在 C++ 中实现一个算法。有许多文件需要处理。这是我的设计:
int main()
{
MPI_Init();
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_rank(MPI_COMM_WORLD, &nproc);
MPI_Barrier(..);
if(my_rank == 0)
{
for (each file to be processed)
{
Read in file content;
MPI_Send data to child processes;
process partial data on root process;
MPI_Recv data processed by child processes;
combine processed data from root and children;
}
}
else
{
MPI_Recv data from root;
process received data;
MPI_Send processed data to root;
MPI_Finalize();
}
//only root process reaches here
MPI_Finalize();
}
当只有一个文件要处理时,程序运行完美。但是,如果我有超过 1 个文件要处理,它将停留在第二个文件中。而且似乎没有子进程可用于从根接收新数据。我认为这是因为我在处理第一个文件后终止了子进程。但是如果我在 else 块中注释掉 MPI_Finalize() ,程序将在处理第一个数据文件后退出,并出现错误:
mpirun has exited due to process rank 1 with PID 2003 on
node c301-115 exiting improperly. There are three reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.
在这种情况下,有没有办法为子进程重置 MPI 实例?在哪里完成子进程的最佳位置?
【问题讨论】:
-
MPI_Finalize 不会导致程序退出。您的工作进程实际上会调用该函数两次,这可能会产生意想不到的结果。