【发布时间】:2016-01-04 14:32:12
【问题描述】:
我在python 2.7 中编写了一个小代码,用于通过subprocess 在shell 上启动4 个独立进程,使用库mpi4py。我收到了 ORTE_ERROR_LOG,我想了解它发生在哪里以及为什么。
这是我的代码:
#!/usr/bin/python
import subprocess
import re
import sys
from mpi4py import MPI
def main():
root='base'
comm = MPI.COMM_WORLD
if comm.rank == 0:
job = [root+str(i) for i in range(4)]
else:
job = None
job = comm.scatter(job, root=0)
cmd="../../montepython/montepython/MontePython.py -conf ../config/default.conf -p ../config/XXXX.param -o ../chains/XXXX -N 10000 > XXXX.log"
cmd_job = re.sub(r"XXXX", job, cmd)
subprocess.check_call(cmd_job, shell=True)
return
if __name__ == '__main__':
main()
我正在使用命令运行:
mpirun -np 4 ./run.py
这是我收到的错误消息:
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: localhost
System call: open(2)
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
我无法理解错误发生在哪里。 MontePython 本身不应使用 mpi,因为它应该是串行的。
我向 openmpi 用户论坛寻求帮助。他们告诉我,问题可能是由于子流程和 MPI 实现之间的不良交互造成的。我应该从subprocess 更改为spawn,但是这个函数没有很好的文档记录,我不确定如何继续
【问题讨论】:
标签: python shell subprocess mpi