【问题标题】:keras(-gpu) + tensorflow-gpu + anaconda on Kubuntukeras(-gpu) + tensorflow-gpu + anaconda on Kubuntu
【发布时间】:2025-12-29 04:35:15
【问题描述】:

我有 Kubuntu 18.04 和 Anaconda 5.2 64。我安装了 CUDA 驱动程序和 keras-gpu 和 tensorflow-gpu(自动也安装了 tensorflow)。

以下代码

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

import tensorflow as tf
print('Tensorflow: ', tf.__version__)

给出输出

2018-07-29 12:14:06.821996: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-29 12:14:06.880569: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-29 12:14:06.880910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.2155
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 2.72GiB
2018-07-29 12:14:06.880924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-07-29 12:14:07.058984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-29 12:14:07.059012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0
2018-07-29 12:14:07.059017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N
2018-07-29 12:14:07.059114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 2430 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 18195666940796676435
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 2548367360
locality {
  bus_id: 1
  links {
  }
}
incarnation: 7016427886680347829
physical_device_desc: "device: 0, name: GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2"
]
Using TensorFlow backend.
Tensorflow:  1.9.0

所以看起来 keras 使用的是 tensorflow CPU 而不是 GPU(使用 DeepBach,我的一个 CPU 内核是 100%)?我做错了什么?

如何找出 keras/DeepBach 正在使用哪个设备? nvidia-smi 在使用 keras/DeepBach 进行训练期间显示没有 GPU 利用率。如何告诉 keras/DeepBach 使用 GPU 而不是 CPU?

CUDA 似乎已安装:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

$ nvidia-smi
Sun Jul 29 12:10:28 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 00000000:01:00.0  On |                  N/A |
|  4%   62C    P0    47W / 180W |   1160MiB /  4040MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1113      G   /usr/lib/xorg/Xorg                           562MiB |
|    0      1385      G   kwin_x11                                     152MiB |
|    0      1395      G   /usr/bin/krunner                               2MiB |
|    0      1399      G   /usr/bin/plasmashell                         167MiB |
|    0     26801      G   ...-token=2DD4BBFEA86302FEC3C179E07D55C897   267MiB |
+-----------------------------------------------------------------------------+

【问题讨论】:

  • 您的日志没有说明正在使用哪个设备,您需要更具体地说明问题。 GPU也被TF正确检测到了,所以我看不出有什么问题。
  • 评论不是为了提出新问题,而是为了澄清你的问题。

标签: python tensorflow keras anaconda


【解决方案1】:

当您运行代码时,请检查 system-monitor 以查看是否涉及 GPU。专门检查 Gpu 的内存使用情况

【讨论】:

    【解决方案2】:

    我认为你已经编译(或者你安装了已经编译的包)带有 CUDA 支持的 tensorflow,但不支持你的 CPU 可用的所有指令(你的 CPU 支持 SSE4.1、SSE4.2、AVX、AVX2 和 FMA 指令tensorflow 可以使用)。

    这意味着,tensorflow 可以正常工作(完全支持 GPU),但您不能满负荷使用处理器。

    尝试与此示例比较时间(GPU 与 CPU):https://*.com/a/54661896/10418812

    【讨论】: