【发布时间】:2021-09-06 05:16:35
【问题描述】:
我在这部分代码中有问题(这在任务之间很常见):
for (i = 0; i < m; i++) {
// some code
MPI_Reduce(&res, &mn, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
// some code
}
这工作正常,但对于较大的 m 值,我收到此错误:
Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1198).........................: MPI_Reduce(sbuf=008FFC80, rbuf=008FFC8C, count=1, MPI_INT, MPI_MIN, root=0, MPI_COMM_WORLD) failed
MPIR_Reduce(764)..........................:
MPIR_Reduce_binomial(207).................:
MPIC_Send(41).............................:
MPIC_Wait(513)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(436):
MPIDI_CH3_PktHandler_EagerShortSend(306)..: Failed to allocate memory for an unexpected message. 261895 unexpected messages queued.
job aborted:
rank: node: exit code[: error message]
0: AmirDiab: 1
1: AmirDiab: 1
2: AmirDiab: 1: Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1198).........................: MPI_Reduce(sbuf=008FFC80, rbuf=008FFC8C, count=1, MPI_INT, MPI_MIN, root=0, MPI_COMM_WORLD) failed
MPIR_Reduce(764)..........................:
MPIR_Reduce_binomial(207).................:
MPIC_Send(41).............................:
MPIC_Wait(513)............................:
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(436):
MPIDI_CH3_PktHandler_EagerShortSend(306)..: Failed to allocate memory for an unexpected message. 261895 unexpected messages queued.
3: AmirDiab: 1
有什么建议吗?
【问题讨论】:
-
MPI_INT与bool不匹配(在 C++ 中使用MPI_CXX_BOOL,请参阅 stackoverflow.com/questions/57598517/…) -
感谢您的评论,
bool实际上是int,正如我在代码中写typedef int bool;所指出的那样,我将 bool 定义为 int。如果这让人感到困惑,我很抱歉。我会编辑问题:) -
根本原因可能是先前内存损坏的结果。你能用minimal reproducible example 编辑你的问题吗?
-
MPI_Reduce的返回值是多少?
-
我是 MPI 新手,不知道 reduce 在任务之间的实际工作方式,你的意思是我应该避免在循环内使用
MPI_Reduce吗? @GillesGouaillardet