【问题标题】:RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False, Dataloader Error, and setting pin_memory=FalseRuntimeError:尝试在 CUDA 设备上反序列化对象,但 torch.cuda.is_available() 为 False,Dataloader 错误,并且设置 pin_memory=False
【发布时间】:2026-01-16 22:20:04
【问题描述】:

我是一名初学者,试图评估这篇视频对象分割网络论文。

按照https://github.com/seoungwugoh/STM上的说明操作时

上面说要求如下:-

python 3.6
pytorch 1.0.1.post2
numpy, opencv, pillow

我无法安装这个 pytorch 版本,所以我安装了 conda-forge pytorch 1.5 版。

我使用 Anaconda 在 Windows 10 或 Ubuntu 16.04 中运行此命令

(STMVOS) oneworld@oneworld:~/Documents/VideoObjectSegmentation/STMVOS$ python eval_DAVIS.py -g '1' -s val -y 16 -D ../DAVISSemiSupervisedTrainVal480

pip install matplotlib 之后,pip install tqdm ...

我收到以下错误消息:-

时空记忆网络:已初始化。 STM : 在 DAVIS 上进行测试 加载权重:STM_weights.pth Traceback(最近一次调用最后一次):

文件“eval_DAVIS.py”,第 111 行,在 model.load_state_dict(torch.load(pth_path))

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 593 行,加载中 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 773 行,在 _legacy_load 结果 = unpickler.load()

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 729 行,在 persistent_load 中

deserialized_objects[root_key] = restore_location(obj, location)

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 178 行,位于 default_restore_location 结果 = fn(存储,位置)

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 154 行,在 _cuda_deserialize device = validate_cuda_device(位置)

文件“/home/oneworld/anaconda3/envs/STMVOS/lib/python3.8/site-packages/torch/serialization.py”,第 138 行,在 validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA'

RuntimeError:尝试反序列化 CUDA 设备上的对象,但 torch.cuda.is_available() 为 False。如果您在仅 CPU 的机器上运行,请使用带有 map_location=torch.device('cpu') 的 torch.load 将您的存储映射到 CPU

我的显卡驱动、系统和包如下:-

(STMVOS) oneworld@oneworld:~/Documents/VideoObjectSegmentation/STMVOS$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| 26%   34C    P8    10W / 151W |    392MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1247      G   /usr/lib/xorg/Xorg                           229MiB |
|    0      2239      G   compiz                                       126MiB |
|    0      9385      G   /usr/lib/firefox/firefox                       2MiB |
|    0     11686      G   /proc/self/exe                                30MiB |
+-----------------------------------------------------------------------------+

我也试过了

(STMVOS) oneworld@oneworld:~/Documents/VideoObjectSegmentation/STMVOS$ python -c 'import torch; print(torch.rand(2,3).cuda())'

张量([[0.9178, 0.8239, 0.4761], [0.9429, 0.8877, 0.0097]], device='cuda:0')

这表明 cuda 在这里工作

(STMVOS) oneworld@oneworld:~/Documents/VideoObjectSegmentation/STMVOS$ conda info
    active environment : STMVOS
    active env location : /home/oneworld/anaconda3/envs/STMVOS
            shell level : 1
       user config file : /home/oneworld/.condarc
 populated config files : 
          conda version : 4.8.2
    conda-build version : 3.18.11
         python version : 3.7.6.final.0
       virtual packages : __cuda=10.2
                          __glibc=2.23
       base environment : /home/oneworld/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/oneworld/anaconda3/pkgs
                          /home/oneworld/.conda/pkgs
       envs directories : /home/oneworld/anaconda3/envs
                          /home/oneworld/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.2 requests/2.22.0 CPython/3.7.6 Linux/4.4.0-179-generic ubuntu/16.04.6 glibc/2.23
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False
(STMVOS) oneworld@oneworld:~/Documents/VideoObjectSegmentation/STMVOS$ conda list

/home/oneworld/anaconda3/envs/STMVOS 环境中的包:

Name Version Build Channel _libgcc_mutex 0.1 main
blas 1.0 mkl
bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.4.5.1 hecc5488_0 conda-forge cairo 1.16.0 hcf35c78_1003 conda-forge certifi 2020.4.5.1 py38_0
cudatoolkit 10.2.89 hfd86e86_1
cycler 0.10.0 pypi_0 pypi dbus 1.13.6 he372182_0 conda-forge expat 2.2.9 he1b5a44_2 conda-forge ffmpeg 4.2.3 h167e202_0 conda-forge fontconfig 2.13.1 h86ecdb6_1001 conda-forge freetype 2.9.1 h8a8886c_1
gettext 0.19.8.1 hc5be6a0_1002 conda-forge giflib 5.2.1 h516909a_2 conda-forge glib 2.64.3 h6f030ca_0 conda-forge gmp 6.2.0 he1b5a44_2 conda-forge gnutls 3.6.5 hd3a4fd2_1002 conda-forge graphite2 1.3.13 he1b5a44_1001 conda-forge gst-plugins-base 1.14.5 h0935bb2_2 conda-forge gstreamer 1.14.5 h36ae1b5_2 conda-forge harfbuzz 2.4.0 h9f30f68_3 conda-forge hdf5 1.10.6 nompi_h3c11f04_100 conda-forge icu 64.2 he1b5a44_1 conda-forge intel-openmp 2020.1 217
jasper 1.900.1 h07fcdf6_1006 conda-forge jpeg 9c h14c3975_1001 conda-forge kiwisolver 1.2.0 pypi_0 pypi lame 3.100 h14c3975_1001 conda-forge ld_impl_linux-64 2.33.1 h53a641e_7
libblas 3.8.0 15_mkl conda-forge libcblas 3.8.0 15_mkl conda-forge libclang 9.0.1 default_hde54327_0 conda-forge libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 he1b5a44_1007 conda-forge libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.15 h516909a_1006 conda-forge liblapack 3.8.0 15_mkl conda-forge liblapacke 3.8.0 15_mkl conda-forge libllvm9 9.0.1 he513fc3_1 conda-forge libopencv 4.2.0 py38_6 conda-forge libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
libuuid 2.32.1 h14c3975_1000 conda-forge libwebp 1.0.2 h56121f0_5 conda-forge libxcb 1.13 h14c3975_1002 conda-forge libxkbcommon 0.10.0 he1b5a44_0 conda-forge libxml2 2.9.10 hee79883_0 conda-forge matplotlib 3.2.1 pypi_0 pypi mkl 2020.1 217
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.0.15 py38ha843d7b_0
mkl_random 1.1.1 py38h0573a6f_0
ncurses 6.2 he6710b0_1
nettle 3.4.1 h1bed415_1002 conda-forge ninja 1.9.0 py38hfd86e86_0
nspr 4.25 he1b5a44_0 conda-forge nss 3.47 he751ad9_0 conda-forge numpy 1.18.1 py38h4f9e942_0
numpy-base 1.18.1 py38hde5b4d6_1
olefile 0.46 py_0
opencv 4.2.0 py38_6 conda-forge openh264 2.1.1 h8b12597_0 conda-forge openssl 1.1.1g h516909a_0 conda-forge pcre 8.44 he1b5a44_0 conda-forge pillow 7.1.2 py38hb39fc2d_0
pip 20.0.2 py38_3
pixman 0.38.0 h516909a_1003 conda-forge pthread-stubs 0.4 h14c3975_1001 conda-forge py-opencv 4.2.0 py38h23f93f0_6 conda-forge pyparsing 2.4.7 pypi_0 pypi python 3.8.1 h0371630_1
python-dateutil 2.8.1 pypi_0 pypi python_abi 3.8 1_cp38 conda-forge pytorch 1.5.0 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch qt 5.12.5 hd8c4c69_1 conda-forge readline 7.0 h7b6447c_5
setuptools 46.4.0 py38_0
six 1.14.0 py38_0
sqlite 3.31.1 h62c20be_1
tk 8.6.8 hbc83047_0
torchvision 0.6.0 py38_cu102 pytorch tqdm 4.46.0 pypi_0 pypi wheel 0.34.2 py38_0
x264 1!152.20180806 h14c3975_0 conda-forge xorg-kbproto 1.0.7 h14c3975_1002 conda-forge xorg-libice 1.0.10 h516909a_0 conda-forge xorg-libsm 1.2.3 h84519dc_1000 conda-forge xorg-libx11 1.6.9 h516909a_0 conda-forge xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.3 h516909a_0 conda-forge xorg-libxext 1.3.4 h516909a_0 conda-forge xorg-libxrender 0.9.10 h516909a_1002 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h14c3975_1002 conda-forge xorg-xproto 7.0.31 h14c3975_1007 conda-forge xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0

它卡在eval_DAVIS.py中的代码如下:-

print('Loading weights:', pth_path)
model.load_state_dict(torch.load(pth_path))

我使用的是 Ubuntu 16.04,但是我在 Windows 10 中尝试了类似的设置并收到了相同的错误消息。

非常感谢任何帮助。

亲切的问候

一个世界

【问题讨论】:

  • 这不是一个与 CUDA 编程相关的问题。请不要重新添加 CUDA 标签。

标签: python python-3.x pytorch nvidia


【解决方案1】:

我刚刚为这个项目创建了README.md 文件以成功运行,它在这里:Install PyTorch via pip to run STM Paper。我已经在 Windows 10 中使用 Cuda 版本 10.1 进行了测试。只需按照此README.md 一步一步操作即可。

根据您的系统配置,您的 PyTorch 安装命令可能会有所不同,请获取如下图所示的安装命令:

您的requirements.txt 文件应如下所示:

注意:我没有对 [path/to/DAVIS] 或其他东西做任何事情。您可能能够运行脚本eval_DAVIS.py 而不会出现安装错误,并且我测试的所有内容都是如此。您也应该在Ubuntu 中运行,只需使用来自README.md 的适当命令即可。

编码愉快!

【讨论】:

  • 嗨,杰,谢谢。我尝试了你的建议,但得到了同样的错误:-(STMVOS) C:\....\STMVOS>nvcc --versionCuda compilation tools, **release 10.1, V10.1.105**我检查了我的 PyTorch 安装版本(STMVOS) C:\....\STMVOS>python -c "import torch; print(torch.__version__)"**1.5.0+cu101**我检查了我的环境变量系统变量C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\binC:\Program Files\NVIDIA GPU 计算@987654337 @ :- 你说的是什么 PyTorch 版本和 CUDA 版本?获得编码会很棒。
  • 我的 Pytorch 版本是 1.5.0,Cuda 版本是 10.1。另外,我注意到您的 virtualenv 是 STMVOS。当你创建这个 virtualenv 时,你是基于 Conda python 创建的吗?实际上,我不使用 Conda python。
  • 这次我没有,我只是从windows命令行做的。就在刚才,我再次使用 env 作为环境变量名称来运行您的指令,并且出现了同样的错误。您的 CUDA 版本是 10.1.105 吗?还有你用的是什么显卡驱动版本?
  • 在 Windows 10 中,我使用的是 NVIDIA 显卡驱动程序版本 441.22。 GPU 是 Geforce GTX 1070
  • 我也试过用 -g 0 代替 -g 1 eval_DAVIS.py -g '0' -s val -y 17 -D ../DAVIS2017SemiSupervisedTrainVal480 同样的错误
【解决方案2】:

我从 3.8 更改了我的 python 版本。到 3.6,使用 conda-forge 进行安装和重新安装 matplotlib。

我在 MSVSCode 中以调试模式运行代码 eval_DAVIS.py,而不是从命令行注释掉 args,如下所示:-

# def get_arguments():
#     parser = argparse.ArgumentParser(description="SST")
#     parser.add_argument("-g", type=str, help="0; 0,1; 0,3; etc", required=True)
#     parser.add_argument("-s", type=str, help="set", required=True)
#     parser.add_argument("-y", type=int, help="year", required=True)
#     parser.add_argument("-viz", help="Save visualization", action="store_true")
#     parser.add_argument("-D", type=str, help="path to data",default='/local/DATA')
#     return parser.parse_args()

# args = get_arguments()

# GPU = args.g
# YEAR = args.y
# SET = args.s
# VIZ = args.viz
# DATA_ROOT = args.D

GPU = '0'
YEAR = '17'
SET = 'val'
VIZ = 'store_true'
DATA_ROOT = '..\\DAVIS2017SemiSupervisedTrainVal480'

上线

for seq, V in enumerate(Testloader):

我写这个是为了测试是否存在 cuda is available 问题。

if torch.cuda.is_available() == False:
    print("********** CUDA is NOT available just before line of error **********")
else:
    print("********** CUDA is available, and working fine just before line of error ***********")

这会产生以下终端日志

Space-time Memory Networks: initialized.
STM : Testing on DAVIS
using Cuda devices, num: 1
--- Produce mask overaid video outputs. Evaluation will run slow.
--- Require FFMPEG for encoding, Check folder ./viz
Loading weights: STM_weights.pth
Start Testing: STM_DAVIS_17val
********** CUDA is available, and working fine just before line of error ***********
Space-time Memory Networks: initialized.
STM : Testing on DAVIS
using Cuda devices, num: 1
--- Produce mask overaid video outputs. Evaluation will run slow.
--- Require FFMPEG for encoding, Check folder ./viz
Space-time Memory Networks: initialized.
STM : Testing on DAVIS
using Cuda devices, num: 1
--- Produce mask overaid video outputs. Evaluation will run slow.
--- Require FFMPEG for encoding, Check folder ./viz
Loading weights: STM_weights.pth
Loading weights: STM_weights.pth
Start Testing: STM_DAVIS_17val
********** CUDA is available, and working fine just before line of error ***********
Start Testing: STM_DAVIS_17val
********** CUDA is available, and working fine just before line of error ***********

到这行代码

for seq, V in enumerate(Testloader):

并提供以下错误消息:-

Exception has occurred: RuntimeError

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
  File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\eval_DAVIS.py", line 127, in <module>
    for seq, V in enumerate(Testloader):
  File "<string>", line 1, in <module>

因此,这消除了 CUDA 错误,无需切换代码以使用 CPU。

但是,这仍然会产生 freeze_support() 错误...

日志识别数据加载器错误:-

Traceback (most recent call last):
  File "eval_DAVIS.py", line 127, in <module>
    for seq, V in enumerate(Testloader):
  File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()
  File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 798, in _get_data
    success, data = self._try_get_data()
  File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 774, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 15916, 1232) exited unexpectedly

【讨论】:

    【解决方案3】:

    所以因为这个错误,以及 Python 抛出的建议。

    RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU
    

    我尝试从这里编辑 eval_DAVIS.py 中第 111 行的代码

    model.load_state_dict(torch.load(pth_path))
    

    到这里

    model.load_state_dict(torch.load(pth_path, map_location=torch.device('cpu')))
    

    然后重新运行代码。

    (STMVOS) C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS>python eval_DAVIS.py -g '0' -s val -y 17 -D C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\DAVIS2017SemiSupervisedTrainVal480
    

    通过权重加载。

    Space-Time Memory Networks: initialized.
    STM : Testing on DAVIS
    Loading weights: STM_weights.pth
    Start Testing: STM_DAVIS_17val
    Space-Time Memory Networks: initialized.
    STM : Testing on DAVIS
    Space-Time Memory Networks: initialized.
    STM : Testing on DAVIS
    Loading weights: STM_weights.pth
    Loading weights: STM_weights.pth
    

    但是,当它开始测试时,出现以下错误:-

    Start Testing: STM_DAVIS_17val
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 125, in _main
        prepare(preparation_data)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
        main_content = runpy.run_path(main_path,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 265, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 97, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
        w.start()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\context.py", line 326, in _Popen
        return Popen(process_obj)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
        raise RunTimeError('''
    RunTimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.
    
            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:
    
                if __name__ == '__main__':
                    freeze_support()
                    ...
    
            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Start Testing: STM_DAVIS_17val
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 125, in _main
        prepare(preparation_data)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
        main_content = runpy.run_path(main_path,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 265, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 97, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
        w.start()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\context.py", line 326, in _Popen
        return Popen(process_obj)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
        raise RunTimeError('''
    RunTimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.
    
            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:
    
                if __name__ == '__main__':
                    freeze_support()
                    ...
    
            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 761, in _try_get_data
        data = self._data_queue.get(OneWorldeout=OneWorldeout)
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\multiprocessing\queues.py", line 108, in get
        raise Empty
    _queue.Empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
        data = self._next_data()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 841, in _next_data
        idx, data = self._get_data()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 808, in _get_data
        success, data = self._try_get_data()
      File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 774, in _try_get_data
        raise RunTimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
    RunTimeError: DataLoader worker (pid(s) 2412, 15788) exited unexpectedly
    

    这是使用 Anaconda,所以下面的错误只是使用 windows 命令控制台和 pip

    (env) C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS>python eval_DAVIS.py -g '0' -s val -y 17 -D C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\DAVIS2017SemiSupervisedTrainVal480
    Space-OneWorlde Memory Networks: initialized.
    STM : Testing on DAVIS
    Loading weights: STM_weights.pth
    Start Testing: STM_DAVIS_17val
    Space-OneWorlde Memory Networks: initialized.
    STM : Testing on DAVIS
    Space-OneWorlde Memory Networks: initialized.
    STM : Testing on DAVIS
    Loading weights: STM_weights.pth
    Loading weights: STM_weights.pth
    Start Testing: STM_DAVIS_17val
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
        exitcode = _main(fd)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
        prepare(preparation_data)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
        run_name="__mp_main__")
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
        pkg_name=pkg_name, script_name=fname)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
        mod_name, mod_spec, pkg_name, script_name)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
        w.start()
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
        self._popen = self._Popen(self)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
        _check_not_importing_main()
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
        is not going to be frozen to produce an executable.''')
    RunTimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.
    
            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:
    
                if __name__ == '__main__':
                    freeze_support()
                    ...
    
            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Start Testing: STM_DAVIS_17val
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
        exitcode = _main(fd)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
        prepare(preparation_data)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
        run_name="__mp_main__")
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
        pkg_name=pkg_name, script_name=fname)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
        mod_name, mod_spec, pkg_name, script_name)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
        w.start()
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
        self._popen = self._Popen(self)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
        _check_not_importing_main()
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
        is not going to be frozen to produce an executable.''')
    RunTimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.
    
            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:
    
                if __name__ == '__main__':
                    freeze_support()
                    ...
    
            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 761, in _try_get_data
        data = self._data_queue.get(Timeout=Timeout)
      File "C:\Users\OneWorld\AppData\Local\Programs\Python\Python37\lib\multiprocessing\queues.py", line 105, in get
        raise Empty
    _queue.Empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "eval_DAVIS.py", line 117, in <module>
        for seq, V in enumerate(Testloader):
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
        data = self._next_data()
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 841, in _next_data
        idx, data = self._get_data()
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 808, in _get_data
        success, data = self._try_get_data()
      File "C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS\env\lib\site-packages\torch\utils\data\dataloader.py", line 774, in _try_get_data
        raise RunOneWorldeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
    RunTimeError: DataLoader worker (pid(s) 11448, 16644) exited unexpectedly
    

    我还将这段代码放在一个名为 CUDATest.py 的小文件中,以测试 Torch 是否会执行简单的矩阵乘法函数。

    # testing CUDA
    import torch
    device = torch.cuda.current_device()
    
    n = 10
    # 1D inputs, same as torch.dot
    a = torch.rand(n).to(device)
    b = torch.rand(n).to(device)
    result = torch.matmul(a, b) # torch.Size([])
    
    print("matmul result = ", result)
    

    我运行代码如下:-

    (env)C:\Users\OneWorld\Documents\DeepLearning\VideoObjectSegmentation\STMVOS>python CUDATest.py
    

    结果如下:-

    matmul result =  tensor(2.4603, device='cuda:0')
    

    这表明我的 CUDA 和 Pytorch 工作正常。

    【讨论】:

      【解决方案4】:

      因为 Python 错误提示

      if __name__ == '__main__':
          freeze_support()
      

      我添加了这一行

      if __name__ == '__main__':
      

      上线

      for seq, V in enumerate(Testloader):
      

      并缩进该行和下面的所有其他内容。

      然后一直工作到[自行车包装]结束

      但是在 [黑天鹅] 之前要求安装 scipy

      所以我做了 conda install scipy

      然后重新运行,它开始遍历其余的 [bmx-trees]、[breakdance] 等。

      生成的 eval_DAVIS.py 文件如下所示...

      from __future__ import division
      import torch
      from torch.autograd import Variable
      from torch.utils import data
      
      import torch.nn as nn
      import torch.nn.functional as F
      import torch.nn.init as init
      import torch.utils.model_zoo as model_zoo
      from torchvision import models
      
      # general libs
      import cv2
      import matplotlib.pyplot as plt
      from PIL import Image
      import numpy as np
      import math
      import time
      import tqdm
      import os
      import argparse
      import copy
      
      
      ### My libs
      from dataset import DAVIS_MO_Test
      from model import STM
      
      
      torch.set_grad_enabled(False) # Volatile
      
      # def get_arguments():
      #     parser = argparse.ArgumentParser(description="SST")
      #     parser.add_argument("-g", type=str, help="0; 0,1; 0,3; etc", required=True)
      #     parser.add_argument("-s", type=str, help="set", required=True)
      #     parser.add_argument("-y", type=int, help="year", required=True)
      #     parser.add_argument("-viz", help="Save visualization", action="store_true")
      #     parser.add_argument("-D", type=str, help="path to data",default='/local/DATA')
      #     return parser.parse_args()
      
      # args = get_arguments()
      
      # GPU = args.g
      # YEAR = args.y
      # SET = args.s
      # VIZ = args.viz
      # DATA_ROOT = args.D
      
      GPU = '0'
      YEAR = '17'
      SET = 'val'
      VIZ = 'store_true'
      DATA_ROOT = '..\\DAVIS2017SemiSupervisedTrainVal480'
      
      # Model and version
      MODEL = 'STM'
      print(MODEL, ': Testing on DAVIS')
      
      os.environ['CUDA_VISIBLE_DEVICES'] = GPU
      if torch.cuda.is_available():
          print('using Cuda devices, num:', torch.cuda.device_count())
      
      if VIZ:
          print('--- Produce mask overaid video outputs. Evaluation will run slow.')
          print('--- Require FFMPEG for encoding, Check folder ./viz')
      
      
      palette = Image.open(DATA_ROOT + '/Annotations/480p/blackswan/00000.png').getpalette()
      
      def Run_video(Fs, Ms, num_frames, num_objects, Mem_every=None, Mem_number=None):
          # initialize storage tensors
          if Mem_every:
              to_memorize = [int(i) for i in np.arange(0, num_frames, step=Mem_every)]
          elif Mem_number:
              to_memorize = [int(round(i)) for i in np.linspace(0, num_frames, num=Mem_number+2)[:-1]]
          else:
              raise NotImplementedError
      
          Es = torch.zeros_like(Ms)
          Es[:,:,0] = Ms[:,:,0]
      
          for t in tqdm.tqdm(range(1, num_frames)):
              # memorize
              with torch.no_grad():
                  prev_key, prev_value = model(Fs[:,:,t-1], Es[:,:,t-1], torch.tensor([num_objects])) 
      
              if t-1 == 0: # 
                  this_keys, this_values = prev_key, prev_value # only prev memory
              else:
                  this_keys = torch.cat([keys, prev_key], dim=3)
                  this_values = torch.cat([values, prev_value], dim=3)
      
              # segment
              with torch.no_grad():
                  logit = model(Fs[:,:,t], this_keys, this_values, torch.tensor([num_objects]))
              Es[:,:,t] = F.softmax(logit, dim=1)
      
              # update
              if t-1 in to_memorize:
                  keys, values = this_keys, this_values
      
          pred = np.argmax(Es[0].cpu().numpy(), axis=0).astype(np.uint8)
          return pred, Es
      
      
      
      Testset = DAVIS_MO_Test(DATA_ROOT, resolution='480p', imset='20{}/{}.txt'.format(YEAR,SET), single_object=(YEAR==16))
      Testloader = data.DataLoader(Testset, batch_size=1, shuffle=False, num_workers=2, pin_memory=True)
      
      model = nn.DataParallel(STM())
      if torch.cuda.is_available():
          model.cuda()
      model.eval() # turn-off BN
      
      pth_path = 'STM_weights.pth'
      print('Loading weights:', pth_path)
      model.load_state_dict(torch.load(pth_path)) # , map_location=torch.device('cpu')
      
      code_name = '{}_DAVIS_{}{}'.format(MODEL,YEAR,SET)
      print('Start Testing:', code_name)
      
      if torch.cuda.is_available() == False:
          print("********** CUDA is NOT available just before line of error **********")
      else:
          print("********** CUDA is available, and working fine just before line of error ***********")
      
      if __name__ == '__main__':
      
          for seq, V in enumerate(Testloader):
              Fs, Ms, num_objects, info = V
              seq_name = info['name'][0]
              num_frames = info['num_frames'][0].item()
              print('[{}]: num_frames: {}, num_objects: {}'.format(seq_name, num_frames, num_objects[0][0]))
      
              pred, Es = Run_video(Fs, Ms, num_frames, num_objects, Mem_every=5, Mem_number=None)
      
              # Save results for quantitative eval ######################
              test_path = os.path.join('./test', code_name, seq_name)
              if not os.path.exists(test_path):
                  os.makedirs(test_path)
              for f in range(num_frames):
                  img_E = Image.fromarray(pred[f])
                  img_E.putpalette(palette)
                  img_E.save(os.path.join(test_path, '{:05d}.png'.format(f)))
      
              if VIZ:
                  from helpers import overlay_davis
                  # visualize results #######################
                  viz_path = os.path.join('./viz/', code_name, seq_name)
                  if not os.path.exists(viz_path):
                      os.makedirs(viz_path)
      
                  for f in range(num_frames):
                      pF = (Fs[0,:,f].permute(1,2,0).numpy() * 255.).astype(np.uint8)
                      pE = pred[f]
                      canvas = overlay_davis(pF, pE, palette)
                      canvas = Image.fromarray(canvas)
                      canvas.save(os.path.join(viz_path, 'f{}.jpg'.format(f)))
      
                  vid_path = os.path.join('./viz/', code_name, '{}.mp4'.format(seq_name))
                  frame_path = os.path.join('./viz/', code_name, seq_name, 'f%d.jpg')
                  os.system('ffmpeg -framerate 10 -i {} {} -vcodec libx264 -crf 10  -pix_fmt yuv420p  -nostats -loglevel 0 -y'.format(frame_path, vid_path))
      

      不过……

      最后我遇到了内存不足的错误

      [car-shadow]: num_frames: 40, num_objects: 1
      100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:09<00:00,  3.98it/s]
      Traceback (most recent call last):
        File "eval_DAVIS.py", line 129, in <module>
          for seq, V in enumerate(Testloader):
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
          data = self._next_data()
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
          return self._process_data(data)
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
          data.reraise()
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\_utils.py", line 395, in reraise
          raise self.exc_type(msg)
      RuntimeError: Caught RuntimeError in pin memory thread for device 0.
      Original Traceback (most recent call last):
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 31, in _pin_memory_loop
          data = pin_memory(data)
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 55, in pin_memory
          return [pin_memory(sample) for sample in data]
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 55, in <listcomp>
          return [pin_memory(sample) for sample in data]
        File "C:\Users\OneWorld\anaconda3\envs\STMVOS\lib\site-packages\torch\utils\data\_utils\pin_memory.py", line 47, in pin_memory
          return data.pin_memory()
      RuntimeError: cuda runtime error (2) : out of memory at ..\aten\src\THC\THCCachingHostAllocator.cpp:278
      

      所以我在 eval_DAVIS.py 的第 108 行左右将 testloader 从 pin_memory=True 设置为 false

      Testloader = data.DataLoader(Testset, batch_size=1, shuffle=False, num_workers=2, pin_memory=False)
      

      然后重新运行。

      似乎工作正常。

      【讨论】:

        最近更新 更多