TensorFlow 未检测到 GPU答案

【问题标题】：TensorFlow not detecting GPUTensorFlow 未检测到 GPU
【发布时间】：2017-09-15 04:52:38
【问题描述】：

我在 ubuntu 16.04 上安装了 Cuda-8.0 和 Tensorflow GPU 版本。它最初工作正常并使用 GPU。但突然它停止使用 GPU。我通过 pip 安装了 tensorflow 并正确安装了 GPU 版本，因为它最初工作并使用了 GPU。

我在导入 tensorflow 时收到的消息是：

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

很明显，它甚至可以从 LD_LIBRARY_PATH 找到 cuda 库。但是当我得到以下输出时：

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 375.39.0
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:

所以它无法定位 GPU。 nvidia-smi 给出以下输出：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:01:00.0      On |                  N/A |
| 23%   41C    P8    11W / 250W |    337MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1005    G   /usr/lib/xorg/Xorg                             197MiB |
|    0      2032    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd    89MiB |
|    0     30355    G   compiz                                          37MiB |
+-----------------------------------------------------------------------------+

我浏览了 stackoverflow 上的其他链接，但他们大多要求检查 LD_LIBRARY_PATH 或 nvidia-smi。对我来说，两者都是预期的，所以无法理解这个问题。

编辑：我尝试安装 cudnn 5 并将其也放入 LD_LIBRARY_PATH 中，tensorflow 成功读取它，但在创建会话时仍然出现相同的错误。

【问题讨论】：

你需要安装cuDNN库
@Drop 我安装了，我知道我的 LD_LIBRARY_PATH 没有指向它。但是，如果没有它，这不应该仍然运行吗？我确信它在没有它的情况下运行，但后来不知何故搞砸了。
@Drothe pouintp 我也不知道为什么它只需要 libcudnn.so.5 而不是 6。我已经安装了 6 个，我不想降级。
我看到的是日志说缺少 cuDNN。它要求版本 5，因为您的 TF 分发与此版本相关联。如果您愿意，您可以针对 v6 重建 TF（虽然还不确定它是否受支持）。还要检查是否启用了任何these，以防止 TF 看到 GPU。
另一个奇怪的事情是 nvidia-smi 无法解析设备的名称（我在那里看到“图形设备”）。 250W 和 12Gb，是 Titan X 还是 Tesla？您可能还需要检查驱动程序是否安装正确。

标签： tensorflow gpu cudnn

【解决方案1】：

只需将“cudnn64_6.dll”重命名为“cudnn64_5.dll”即可。

【讨论】：

如何/为什么解决这个问题？您能否将您的答案扩展为更有用的内容？
当您下载 cudnn 6.0 zip 文件时，您在 bin 文件夹中找到了一个名为“cudnn64_6.dll”的文件，对吗？将其重命名为“cudnn64_5.dll”，如果你实际安装了 tensorflow-gpu 版本，一切都应该工作