【问题标题】:RuntimeError: cudaGetDevice() failed. Status: invalid argumentRuntimeError: cudaGetDevice() 失败。状态:无效参数
【发布时间】:2021-01-13 10:45:43
【问题描述】:

我正在尝试在配备 Quadro GPU 的笔记本电脑上的 Windows 10 上运行 tensorflow-gpu

-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.21       Driver Version: 465.21       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T2000       WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P0    14W /  N/A |   2708MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

在尝试验证一切是否正常时,我发现 device_lib.list_local_devices() 失败

**RuntimeError: cudaGetDevice() failed. Status: invalid argument**


2021-01-13 11:30:14.735823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro T2000 computeCapability: 7.5
coreClock: 1.5GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.34GiB/s
2021-01-13 11:30:14.736173: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cudart64_110.dll
2021-01-13 11:30:14.736376: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cublas64_11.dll
2021-01-13 11:30:14.736590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cublasLt64_11.dll
2021-01-13 11:30:14.736801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cufft64_10.dll
2021-01-13 11:30:14.737016: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
curand64_10.dll
2021-01-13 11:30:14.737221: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cusolver64_10.dll
2021-01-13 11:30:14.737418: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cusparse64_11.dll
2021-01-13 11:30:14.737590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library
cudnn64_8.dll
2021-01-13 11:30:14.737787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\D041705\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\client\device_lib.py", line
43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: cudaGetDevice() failed. Status: invalid argument

有任何提示方式吗? 我有 CUDA 11.2 python 3.8.7 并安装了 tf 和 tf-gpu 的最新软件包

【问题讨论】:

    标签: python tensorflow gpu


    【解决方案1】:

    刚刚解决了这个问题。我认为通用解决方案是降级您的 CUDA 和 GPU 驱动程序版本。


    首先,根据最新的issue,TensorFlow 2.4 不兼容 Cuda 11.2 或 11.1,改用 Cuda 11.0。

    其次,如果您使用的是最新的 GPU 驱动程序,您会在运行nvidia-smi 时发现 CUDA 版本为 11.3。将 GPU 驱动程序降级到旧版本, 461.09 在我的情况下有效。


    1. Win10 20H2
    2. GTX1070m
    3. 张量流'2.4.1'

    nvidia-smi

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 461.09       Driver Version: 461.09       CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    

    nvcc -V

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Wed_Jul_22_19:09:35_Pacific_Daylight_Time_2020
    Cuda compilation tools, release 11.0, V11.0.221
    Build cuda_11.0_bu.relgpu_drvr445TC445_37.28845127_0
    

    要测试Cuda是否工作,你应该使用Tensorflow进行计算,列出设备可能没有帮助。

    import tensorflow as tf
    tf.debugging.set_log_device_placement(True)
    
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    

    输出:

    Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-10-15
      • 1970-01-01
      • 2020-03-11
      • 1970-01-01
      • 1970-01-01
      • 2020-09-16
      • 2021-09-22
      相关资源
      最近更新 更多