Tensorflow-2.3.0 未检测到 GPU答案

【问题标题】：Tensorflow-2.3.0 does not detect GPUTensorflow-2.3.0 未检测到 GPU
【发布时间】：2020-09-13 14:31:09
【问题描述】：

我正在使用 Ubuntu 20.04。我将 Tensorflow-2.2.0 升级到了 Tensorflow-2.3.0。当版本为 2.2.0 时，tensorflow 很好地利用了 GPU。但是升级到 2.3.0 版本后，它并没有检测到 GPU。

我从 stackoverflow 看到了这个 Link。那是 cuDNN 版本的问题。但我需要 cuDNN 版本。

me_sajied@Kunai:~$ apt list | grep cudnn

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcudnn7-dev/now 7.6.5.32-1+cuda10.1 amd64 [installed,local]
libcudnn7/now 7.6.5.32-1+cuda10.1 amd64 [installed,local]

我还拥有所有必需的软件及其版本。

库达

me_sajied@Kunai:~$ apt list | grep cuda-toolkit

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-toolkit-10-0/unknown 10.0.130-1 amd64
cuda-toolkit-10-1/unknown,now 10.1.243-1 amd64 [installed,automatic]
cuda-toolkit-10-2/unknown 10.2.89-1 amd64
cuda-toolkit-11-0/unknown,unknown 11.0.3-1 amd64
nvidia-cuda-toolkit-gcc/focal 10.1.243-3 amd64
nvidia-cuda-toolkit/focal 10.1.243-3 amd64

Python

me_sajied@Kunai:~$ python3 --version
Python 3.8.2

环境

LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64"

日志

me_sajied@Kunai:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-09-13 21:28:37.387327: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> 
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-09-13 21:28:48.806385: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-13 21:28:48.836251: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2699905000 Hz
2020-09-13 21:28:48.836637: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3fde5f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-13 21:28:48.836685: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-13 21:28:48.840030: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-13 21:28:48.882190: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-13 21:28:48.882582: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x408bd90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-13 21:28:48.882606: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce 930MX, Compute Capability 5.0
2020-09-13 21:28:48.882796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-13 21:28:48.883151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce 930MX computeCapability: 5.0
coreClock: 1.0195GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 14.92GiB/s
2020-09-13 21:28:48.883196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-13 21:28:48.883415: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64
2020-09-13 21:28:48.885196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-13 21:28:48.885544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-13 21:28:48.887160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-13 21:28:48.888134: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-13 21:28:48.891565: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-09-13 21:28:48.891603: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-13 21:28:48.891625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-13 21:28:48.891632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-13 21:28:48.891639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
False
>>>

【问题讨论】：

你的LD_LIBRARY_PATH 指向/usr/lib/cuda-10.1/lib64 吗？如果这不能解决问题，请发布您在 ~/.bashrc 中设置的任何相关环境变量
tf 没有使用 GPU 的证据是什么？还请包括运行 tensorflow 的输出日志，因为它显示了它正在尝试加载哪些 CUDA 库。
我没有在~/.bashrc中添加任何相关的环境变量。 @runDOSrun
我添加了日志。 @史努比博士
是的，现在问题很清楚了，cublas 找不到：Could not load dynamic library 'libcublas.so.10'; dlerror：libcublas.so.10：无法打开共享对象文件：没有这样的文件或目录； LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64 ，你的 LD_LIBRARY_PATH 不完整

标签： python python-3.x tensorflow

【解决方案1】：

在你的~/.bashrc 添加：

LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64

如果 lib64 文件夹的位置不同，则需要相应调整。

顺便说一句，如果你想经常在多个 CUDA 版本之间切换，你也可以直接在终端中为特定命令设置环境变量，例如：

LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64 python myprogram_which_needs_10_1.py

然后，如果要切换到不同的版本，只需修改命令前的路径即可。

【讨论】：

这是什么日志？
2020-09-14 13:51:04.779092: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
@SajiedShahYousuf 那么你的 cuda 有不同的路径。
me_sajied@Kunai:~$ whereis cuda cuda: /usr/local/cuda me_sajied@Kunai:~$ whereis cuda-10.1 cuda-10: /usr/local/cuda-10.1 /usr/local/cuda-10.2

【解决方案2】：

2020-09-13 21:28:48.883415：W tensorflow/stream_executor/platform/default/dso_loader.cc:59] 无法加载动态库“libcublas.so.10”； dlerror：libcublas.so.10：无法打开共享对象文件：没有这样的文件或目录；

在我的情况下，这是由apt upgrade 为 CUDA 10.2 安装
libcublas10 和 libcublas-dev 引起的。

我对这个问题的解决方案如下。

我的环境。基于 NVIDIA 的 CUDA 存储库。

$ sudo apt install --reinstall libcublas10=10.2.1.243-1 libcublas-dev=10.2.1.243-1

并防止出现可升级的候选者。

$ sudo apt-mark hold libcublas10
$ sudo apt-mark hold libcublas-dev

【讨论】：