Tensorflow 2.0rc 未检测到 GPU答案

【问题标题】：Tensorflow 2.0rc not detecting GPUsTensorflow 2.0rc 未检测到 GPU
【发布时间】：2020-01-03 18:51:13
【问题描述】：

TF2 目前没有检测 GPU，我是从 TF1.14 迁移过来的

tf.keras.utils.multi_gpu_model(model=model, gpus=2)

现在返回错误

ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0', '/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2', '/xla_gpu:3']. Try reducing `gpus`.

运行nvidia-smi返回以下信息

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:09:00.0 Off |                    0 |
| N/A   46C    P0    62W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:0A:00.0 Off |                    0 |
| N/A   36C    P0    71W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:86:00.0 Off |                    0 |
| N/A   38C    P0    58W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:87:00.0 Off |                    0 |
| N/A   31C    P0    82W / 149W |      0MiB / 11441MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

也是我的 TF 版本，专为 cuda 构建

2.0.0-rc0

请让我知道我做错了什么，以便我修复它。

【问题讨论】：

你从 pip 安装了哪个包？ tensorflow 或 tensorflow-gpu?
tensorflow-gpu，我用命令 pip install tensorflow-gpu==2.0.0-rc0

标签： python keras tensorflow2.0

【解决方案1】：

我建议你-

请先检查您的 Cuda 版本。确保它是 10.0。
如果是 10.0，则检查您的 TF 版本是否适用于 GPU。
使用命令检查 TF 是否可以访问 GPU

value = tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)
print ('***If TF can access GPU: ***\n\n',value) # MUST RETURN True IF IT CAN!!

我假设前 2 点已由您处理。如果 TF 也可以访问您的 GPU，那么正如您在 Value error 中看到的那样，它实际上具有 GPU 的名称。我不能说tf.keras.utils.multi_gpu_model() 函数，因为我没有在 TF 中使用它。但我建议你使用with tf.device('/gpu:0'):。在此内部，您可以调用 model 或定义模型。
如果第 4 点也不起作用，则只需添加以下行

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" # 0,1,2,3 are number of GPUs

在文件顶部并删除 with tf.device('/gpu:0')

【讨论】：

【解决方案2】：

CUDA 应该是 10.0 版本，而不是 10.1

【讨论】：