【问题标题】:New install doMPI throwing MPI_ERR_SPAWN error新安装 doMPI 抛出 MPI_ERR_SPAWN 错误
【发布时间】:2018-03-14 10:57:23
【问题描述】:

我将 open-mpi 更新到 3.0.0,重新加载了 RmpidoMPI,现在在 Ubuntu Linux R 3.4.2 上执行 startCluster 时出现此错误。

Error in mpi.comm.spawn(slave = rscript, slavearg = args, nslaves = count,  : 
  MPI_ERR_SPAWN: could not spawn processes

如何诊断问题?

【问题讨论】:

  • 从简单的 MPI 代码开始。确保您可以编译和运行简单的 MPI Hello World 应用程序。也许您的 OpenMPI 安装已“损坏”。
  • 谢谢 mko。这对我来说是全新的,但这有效mpirun -np 6 mpi_hello_world Hello world from processor JAM-Home-PC, rank 1 out of 6 processors Hello world from processor JAM-Home-PC, rank 5 out of 6 processors Hello world from processor JAM-Home-PC, rank 2 out of 6 processors ... but this does not jamaas:code$ mpirun -np 7 mpi_hello_world There are not enough slots available in the system to satisfy the 7 slots ...: mpi_hello_world Either request fewer slots for your application, or make more slots available for use.

标签: r parallel-processing mpi


【解决方案1】:

要测试您的 MPI 安装,请执行以下操作:

/* Put this text inside hello.c file */
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    int rank;
    int world;

    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world);
    printf("Hello: rank %d, world: %d\n",rank, world);
    MPI_Finalize();
}

然后,编译它

mpicc -o hello ./hello.c

然后,尝试运行它

mpirun -np 2 ./hello

如果你得到

Hello: rank 0, world: 2
Hello: rank 1, world: 2

这意味着你的 MPI 安装没问题,你必须查看 R 内部,否则意味着 MPI 配置不正确,进一步获取的机会很小。

更新

看起来像 R3.4 + OpenMPI 3.0.0 + Rmpi​​ 行为不端 ;)

如果你尝试在 R 之外运行 slaves,它可以工作。所以,我猜 Rmpi​​ 的本机代码内部存在一些问题。

> cp -r /Library/Frameworks/R.framework/Versions/3.4/Resources/library/Rmpi ~
> cd ~/Rmpi
> mpirun -np 2 ./Rslaves.sh `pwd`/slavedaemon.R tmp needlog /Library/Frameworks/R.framework/Versions/3.4/Resources/
# if you put 
# localhost slots=25
# inside ~/.hostfile, you can acquire more resources
> mpirun --hostfile=~/.hostfile -np 4 ./Rslaves.sh `pwd`/slavedaemon.R tmp needlog /Library/Frameworks/R.framework/Versions/3.4/Resources/

更新为 R 3.4 和 OpenMPI 3.0.0 的适当修复

创建文件:~/.openmpi/mca-params.conf 并放入

orte_default_hostfile=YOUR_USER_HOME/default_host

创建文件:~/default_host 和内容

localhost slots=25

运行 R,加载 RMpi 并运行代码

> library(Rmpi)
> mpi.spawn.Rslaves()
    4 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 5 is running on: pi
slave1 (rank 1, comm 1) of size 5 is running on: pi
slave2 (rank 2, comm 1) of size 5 is running on: pi
slave3 (rank 3, comm 1) of size 5 is running on: pi
slave4 (rank 4, comm 1) of size 5 is running on: pi

完整故事请看这里:R3.4 + OpenMPI 3.0.0 + Rmpi inside macOS - little bit of mess ;)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-11-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-05-29
    • 2023-04-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多