【发布时间】:2020-02-18 13:06:37
【问题描述】:
我已经按照these instructions 安装了带有 TensorFlow 后端的 Keras:
library(keras)
install_keras(tensorflow = "gpu")
安装很顺利,我没有错误消息。
如果我输入:
k = backend()
sess = k$get_session()
sess$list_devices()
据我了解的输出,我的 GPU 似乎已被识别:
[[1]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3277741456357329757)
[[2]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 14524037525637335634)
[[3]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 5788527260077506513)
我的.profile 文件如下所示:
export CUDA_HOME=${CUDA_PATH}
export PATH="${CUDA_PATH}/bin:$PATH"
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_PATH}/lib64/
我还可以列出所有与 Nvidia 相关的包:
[ben@Solgaleo ~]$ pacman -Qs nvidia*
local/cuda 10.2.89-3
NVIDIA's GPU programming toolkit
local/cudnn 7.6.5.32-3
NVIDIA CUDA Deep Neural Network library
local/lib32-nvidia-utils 440.59-1
NVIDIA drivers utilities (32-bit)
local/libvdpau 1.3-1
Nvidia VDPAU library
local/libxnvctrl 440.59-1
NVIDIA NV-CONTROL X extension
local/nvidia 440.59-8
NVIDIA drivers for linux
local/nvidia-settings 440.59-1
Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 440.59-1
NVIDIA drivers utilities
local/nvtop 1.0.0-2
An htop like monitoring tool for NVIDIA GPUs
local/opencl-nvidia 440.59-1
OpenCL implemention for NVIDIA
但是当我建立一个 Keras 模型时,一些库文件没有找到:
library(keras)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
这是错误信息:
2020-02-18 13:45:23.530693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-18 13:45:23.609674: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.610276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:09:00.0
2020-02-18 13:45:23.610420: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610508: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610597: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610680: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610761: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610842: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.646497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-18 13:45:23.646508: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-02-18 13:45:23.647124: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-18 13:45:23.669292: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3794460000 Hz
2020-02-18 13:45:23.670124: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b541a72c0 executing computations on platform Host. Devices:
2020-02-18 13:45:23.670138: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-02-18 13:45:23.670530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-18 13:45:23.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
2020-02-18 13:45:23.982097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.982507: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b5423b030 executing computations on platform CUDA. Devices:
2020-02-18 13:45:23.982529: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
确实,libcudart.so.10.0(例如)找不到,因为它不存在:
[ben@Solgaleo ~]$ ll /opt/cuda/lib64/libcudart.so*
lrwxrwxrwx 1 root root 20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root 20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10 -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root 20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2 -> libcudart.so.10.2.89
-rwxr-xr-x 1 root root 498K Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2.89
所以 TensorFlow 正在寻找 10.0 版本,而我已经安装了 10.2。
在训练我的模型时,只使用 CPU。
我在安装 Keras/TensorFlow 时遇到了什么问题?我该如何解决这个问题?
编辑:以下是 Keras 和 TensorFlow R 软件包的版本:
keras_2.2.5.0
tensorflow_2.0.0
【问题讨论】:
-
查看第一个输出我会说它只识别 CPU:_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3277741456357329757) TensorFlow 版本适用于特定的 CUDA 版本,因此对于 tf 2.0,您需要降级 CUDA 库(或尽可能并行运行它们)
-
如果是这种情况(顺便说一下:stackoverflow.com/questions/55925068/…),事情就没那么简单了……当我尝试降级 CUDA 时,它拒绝了,因为它找不到 gcc7(我的系统最多日期,所以我运行 gcc 9)。如果我降级 gcc,我害怕破坏某些东西......
-
显然,TensorFlow 2.1.0 支持 CUDA 10.1 (tensorflow.org/install/gpu),这可能更容易降级(虽然我没有尝试过)。 TF 通过 R 脚本安装后如何升级到 2.1.0?
-
看起来可以通过指定tf的版本来完成:tensorflow.rstudio.com/installation
install_tensorflow(version = "2.1.0")。关于CUDA安装问题,你用的是什么操作系统? -
nvm os的问题,刚看到pacman命令;我用
yaourt cuda-10.1安装了cuda-10.1,这样任何依赖项(如gcc)都将得到解决
标签: r tensorflow keras