【问题标题】:TensorFlow not detecting GPUTensorFlow 未检测到 GPU
【发布时间】:2017-09-15 04:52:38
【问题描述】:

我在 ubuntu 16.04 上安装了 Cuda-8.0 和 Tensorflow GPU 版本。它最初工作正常并使用 GPU。但突然它停止使用 GPU。我通过 pip 安装了 tensorflow 并正确安装了 GPU 版本,因为它最初工作并使用了 GPU。

我在导入 tensorflow 时收到的消息是:

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

很明显,它甚至可以从 LD_LIBRARY_PATH 找到 cuda 库。 但是当我得到以下输出时:

>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: naman-pc
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 375.39.0
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:

所以它无法定位 GPU。 nvidia-smi 给出以下输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:01:00.0      On |                  N/A |
| 23%   41C    P8    11W / 250W |    337MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1005    G   /usr/lib/xorg/Xorg                             197MiB |
|    0      2032    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd    89MiB |
|    0     30355    G   compiz                                          37MiB |
+-----------------------------------------------------------------------------+

我浏览了 stackoverflow 上的其他链接,但他们大多要求检查 LD_LIBRARY_PATH 或 nvidia-smi。对我来说,两者都是预期的,所以无法理解这个问题。

编辑: 我尝试安装 cudnn 5 并将其也放入 LD_LIBRARY_PATH 中,tensorflow 成功读取它,但在创建会话时仍然出现相同的错误。

【问题讨论】:

  • 你需要安装cuDNN库
  • @Drop 我安装了,我知道我的 LD_LIBRARY_PATH 没有指向它。但是,如果没有它,这不应该仍然运行吗?我确信它在没有它的情况下运行,但后来不知何故搞砸了。
  • @Drothe pouintp 我也不知道为什么它只需要 libcudnn.so.5 而不是 6。我已经安装了 6 个,我不想降级。
  • 我看到的是日志说缺少 cuDNN。它要求版本 5,因为您的 TF 分发与此版本相关联。如果您愿意,您可以针对 v6 重建 TF(虽然还不确定它是否受支持)。还要检查是否启用了任何these,以防止 TF 看到 GPU。
  • 另一个奇怪的事情是 nvidia-smi 无法解析设备的名称(我在那里看到“图形设备”)。 250W 和 12Gb,是 Titan X 还是 Tesla?您可能还需要检查驱动程序是否安装正确。

标签: tensorflow gpu cudnn


【解决方案1】:

只需将“cudnn64_6.dll”重命名为“cudnn64_5.dll”即可。

【讨论】:

  • 如何/为什么解决这个问题?您能否将您的答案扩展为更有用的内容?
  • 当您下载 cudnn 6.0 zip 文件时,您在 bin 文件夹中找到了一个名为“cudnn64_6.dll”的文件,对吗?将其重命名为“cudnn64_5.dll”,如果你实际安装了 tensorflow-gpu 版本,一切都应该工作
猜你喜欢
  • 2018-05-31
  • 2020-12-10
  • 2020-01-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-12-23
  • 2020-11-17
相关资源
最近更新 更多