【发布时间】:2020-11-06 17:42:30
【问题描述】:
我在计算集群上运行 python 脚本时遇到问题,如果这是一个幼稚的错误,我提前道歉。我不确定问题是否源于我错误地配置了我自己的 conda 虚拟环境,但是当我运行时问题仍然存在:
srun -p use-everything --pty python test.py
我得到了错误
Traceback (most recent call last):
File "test.py", line 4, in <module>
from acme.agents.tf import dqn
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/agents/tf/dqn/__init__.py", line 18, in <module>
from acme.agents.tf.dqn.agent import DQN
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/agents/tf/dqn/agent.py", line 20, in <module>
from acme import datasets
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/datasets/__init__.py", line 17, in <module>
from acme.datasets.reverb import make_reverb_dataset
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/datasets/reverb.py", line 22, in <module>
from acme.adders import reverb as adders
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/adders/reverb/__init__.py", line 21, in <module>
from acme.adders.reverb.base import DEFAULT_PRIORITY_TABLE
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/acme/adders/reverb/base.py", line 26, in <module>
import reverb
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/__init__.py", line 27, in <module>
from reverb import item_selectors as selectors
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/item_selectors.py", line 19, in <module>
from reverb import pybind
File "/om2/user/armas/anaconda/envs/dist_rl/lib/python3.7/site-packages/reverb/pybind.py", line 1, in <module>
import tensorflow as _tf; from .libpybind import *; del _tf
ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
srun: error: node014: task 0: Exited with exit code 1
在我的本地机器上,我在运行虚拟环境时遇到了同样的问题,我只是用sudo apt-get install libpython3.7 解决了这个问题。
这里有一些其他的知识可能会有所帮助。
$which libpython
/usr/bin/which: no libpython in (/om2/user/armas/anaconda/envs/dist_rl/bin:/om2/user/armas/anaconda/bin:/om2/user/armas/anaconda/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
$echo $PATH
/om2/user/armas/anaconda/envs/dist_rl/bin:/om2/user/armas/anaconda/bin:/om2/user/armas/anaconda/condabin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
$echo $LD_LIBRARY_PATH
/om2/user/armas/anaconda/bin/
当我更改我的LD_LIBRARY_PATH,即export LD_LIBRARY_PATH=/om2/user/armas/anaconda/lib:$LD_LIBRARY_PATH 并运行脚本时,我的 anaconda 认为我没有安装 jax。我运行了 pip install dm-acme[jax],现在当我运行脚本时,它说我没有名为 atari_py 的模块。我认为它正在引导我走下一系列依赖关系。
我使用this link 安装了 acme,但使用的是 conda 环境。我的系统管理员说 acme 可能不是为 anaconda 制作的。如果是这样,为什么会这样?
如果有什么遗漏的请告诉我,我一定会补充的,再次感谢!
【问题讨论】:
-
两件事可能会有所帮助(1)你能给我们
ImportError行的完整错误回溯吗(2)你能给我们一些想法在test.py中运行什么样的代码?对我来说,它看起来有点像一个扩展模块,它在不同的环境中编译,现在在错误的环境中执行。附带说明:您的诊断中有一些奇怪的事情:which libpython不起作用 -which只会找到可执行文件而不是库。$echo $LD_LIBRARY_PATH应该打印库目录的路径(例如/om2/user/armas/anaconda/lib)而不是bin。 -
您好,我已经更新了帖子。让我知道你的想法谢谢!
-
我在虚拟环境中使用dearpygui时也遇到了这个问题,这个解决方案很有效。我必须在 CMD 提示符下从虚拟环境中运行 apt-get 才能安装所需的库。