【发布时间】:2018-09-22 00:48:48
【问题描述】:
我在 MPI 中实现了一个有效的屏障实现,但有时它运行,有时但在主函数的最后一行(我认为)崩溃。奇怪的是它只有 30% 的时间会崩溃。
我的代码:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <time.h>
#include <sys/time.h>
#include <mpi.h>
void barrier(){
//Get rank and number of processors
int my_rank, num_procs;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
//MPI Communication status var
MPI_Status status;
char b_req[1];
//**Barier implementation for P0**
if(my_rank == 0){
double exec_start_all = MPI_Wtime();
bool *nodes_barrier = (bool *)malloc(num_procs*sizeof(bool));
*(nodes_barrier) = true;
int start_count = 1; int end_count = 1; int i;
//Set all values array, except for P0, on false
for(i=1; i<num_procs; i++)
*(nodes_barrier + i) = false;
//Receive msg from all procs which started their barrier
while(start_count < num_procs){
MPI_Recv(&b_req,sizeof(b_req),MPI_CHAR, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status);
if(*(nodes_barrier + status.MPI_SOURCE) == false){
*(nodes_barrier + status.MPI_SOURCE) = true;
start_count ++;
}
}
//Once all procs started their barrier, send msg to all procs to release
int j;
for(j=1; j<num_procs; j++)
MPI_Send(&b_req,sizeof(b_req),MPI_CHAR,j,2,MPI_COMM_WORLD);
//Get execution time of barrier from all procs and calculate the overal barrier execution time
double *tmp = (double *)malloc(2*sizeof(double));
double exec_end_all = MPI_Wtime();
while(end_count < num_procs){
MPI_Recv((double *)tmp,(2*sizeof(double)),MPI_DOUBLE, MPI_ANY_SOURCE, 3, MPI_COMM_WORLD, &status);
if(*(nodes_barrier + status.MPI_SOURCE) == true){
double start = ((double)*tmp+0);
double end = ((double)*tmp+1);
printf("P(%d) has start: %lf end: %lf \n",status.MPI_SOURCE, start, end);
if(start < exec_start_all)
exec_start_all = start;
if(end > exec_end_all)
exec_end_all = end;
*(nodes_barrier + status.MPI_SOURCE) = false;
end_count ++;
}
}
/*if(MPI_Wtime() > exec_end_all){
exec_end_all = MPI_Wtime();
}&*/
printf("Barrier finished, start: %lf End: %lf Execution time: %lf \n", exec_start_all, exec_end_all, (exec_end_all - exec_start_all));
free(nodes_barrier);
free(tmp);
//**Barier implementation for Pn-1 (except P0)**
} else {
double *execution = (double *)malloc(2*sizeof(double));
*(execution + 0) = MPI_Wtime();
//Send P0 that this proc starts its barrier
MPI_Send(&b_req,sizeof(b_req),MPI_CHAR,0,1,MPI_COMM_WORLD);
//Receive command from P0 that it can release its barrier
MPI_Recv(&b_req,sizeof(b_req),MPI_CHAR, 0, 2, MPI_COMM_WORLD, &status);
//Measure and send execution time to P0
*(execution + 1) = MPI_Wtime();
MPI_Send((double *)execution,(2*sizeof(double)),MPI_DOUBLE,0,3,MPI_COMM_WORLD);
free(execution);
}
}
int main(int argc, char *argv[])
{
//Initialize the infrastructure necessary for communication
MPI_Init(&argc,&argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
barrier();
printf("FINAL %d \n", my_rank);
MPI_Finalize();
return 0;
}
有时我会收到此错误,请注意这总是在执行结束时发生:
有时我可能会收到此错误:
有人知道发生了什么吗?或者至少帮助我朝着正确的方向前进?
提前致谢
【问题讨论】:
-
您不能使用
MPI_BYTE转移bool。 MPI 允许您将MPI_C_BOOL与_Bool一起使用。 -
@GillesGouaillardet 我刚刚将数据类型更改为 char (MPI_CHAR); _Bool 不受支持,但是,我仍然得到同样的错误。谢谢你帮助我 :)
-
我还添加了我的程序完整输出的新图片,您可以看到错误发生在执行结束时。
-
一般来说,除非必要,否则不要使用屏幕截图。在这种情况下,复制/粘贴错误是最合适的。
标签: segmentation-fault runtime-error runtime mpi distributed-computing