【发布时间】:2021-01-14 14:13:28
【问题描述】:
我正在尝试开始使用 Keras,并且我拥有一种新型的 Nvidia GPU,但尽管我使用的是全新安装的 Ubuntu,但我似乎无法启动它( 20.04)。
在我第一次尝试时,我注意到 Ubuntu 检测到了我的显卡,所以我通过进入“附加驱动程序”来安装它。然后我使用以下命令安装了 Keras 和 Tensorflow,并且没有产生任何错误。
install.packages("keras")
library(keras)
install_keras(tensorflow = "gpu")
但是,当我尝试实际设置 Keras 模型时,
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
我收到这条可怕的错误消息:
2021-01-14 09:04:53.188680: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-14 09:04:53.189214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-14 09:04:53.224466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 09:04:53.224843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-14 09:04:53.224860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-14 09:04:53.226413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-14 09:04:53.226446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-01-14 09:04:53.226935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-14 09:04:53.227061: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-14 09:04:53.227139: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/arta/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib:/usr/local/cuda-11.2/lib64:::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server:/usr/local/cuda-11.2/lib64
2021-01-14 09:04:53.227437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-14 09:04:53.227513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-14 09:04:53.227519: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-01-14 09:04:53.228275: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-14 09:04:53.228290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-14 09:04:53.228293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
您可能会注意到,此错误消息提到了cuda-11.2,但是,当我使用系统默认的cuda-10.1 时,我收到了几乎相同的错误消息,我想这是驱动程序附带的。
我做了很多事情,包括直接从 Nvidia 的网站下载并尝试使用他们的文档安装 cuDNN,并将 cuda 添加到 PATH 和 LD_LIBRARY_PATH,但无济于事。
最后,我删除了我的 r-reticulate conda 环境,这样我就可以从头开始重新安装 Tensorflow,但使用 cuda 11.2 而不是默认的 10.1。
我按照this blog post 上的说明进行操作,但我将 10.1 的每个实例替换为 11.2,并将 libcudnn.so.7 替换为 libcudnn.so.8,因为这是可用的最新版本,也是我下载到系统中的版本,这让我看到了上面的错误消息,这与我使用 10.1 时得到的几乎相同,这是我的计算机默认设置的。
另外,当我再次尝试在 R 中使用 Tensorflow 时,我注意到一些奇怪的事情。我使用install_keras(tensorflow = "gpu") 安装它,没有明显问题,但是当我调用以下命令时:
imdb <- dataset_imdb(num_words = 10000)
它再次开始为我下载和安装它,但它给了我这个警告:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-gpu 2.2.0 requires tensorboard<2.3.0,>=2.2.0, but you have tensorboard 2.4.0 which is incompatible.
tensorflow-gpu 2.2.0 requires tensorflow-estimator<2.3.0,>=2.2.0, but you have tensorflow-estimator 2.4.0 which is incompatible.
我该怎么做?为什么它可以使用正确的CUDA安装:
2021-01-14 09:00:06.766462: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
但它不能在其他地方使用另一个文件?
2021-01-14 09:04:53.227139: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/arta/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib:/usr/local/cuda-11.2/lib64:::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server:/usr/local/cuda-11.2/lib64
我现在该怎么办?为什么我不能让 gpu 加速工作?我的计划是按照那篇博文中的说明,从 Ubuntu 中清除所有 Nvidia 软件,然后使用 10.1 重试,因为这似乎是最稳定的版本。
【问题讨论】:
-
您的 TF 期望使用 CUDA 11.0。你有 CUDA 11.2。您不能使用 CUDA 11.2 替代 CUDA 11.0
-
@RobertCrovella 如果我要清除所有与 Nvidia CUDA 相关的软件和显卡驱动程序并重新启动,您建议我使用哪个版本的 CUDA 以实现与 libcudnn 和张量流?
标签: r tensorflow keras