【发布时间】:2021-03-14 13:35:50
【问题描述】:
我想使用我的 GPU 训练一个 Tensorflow 模型
我正在使用:
tensorboard 2.4.1
tensorboard-plugin-wit 1.8.0
tensorflow-estimator 2.4.0
tensorflow-gpu 2.4.1
cuda 11.0
cdnn 8.0.4
gpu RTX 3060 Laptop 6Gb
Nvidia FrameView SDK 1.1.4923.29548709
Nvidia Graphics Drivers 461.72
Nvidia PhysX 9.19.0218
Python 3.8.5
IDE Spyder 4.2.1
OS Windows 10 LTSC-2019 (modified)
在发布此帮助之前我做了什么?
1/ 我已经安装了 Nvidia 显卡驱动程序
2/ 我已经学习了这个 Tensorflow 教程:https://www.tensorflow.org/install/gpu
所以我从 C:\tools\ 的 cdnn 下载存档中复制了 cuda 文件夹
我还添加了 Path 所需的所有变量
3/ 尝试训练我的模型(如果我使用 CPU 则一切正常):
with tf.device("/GPU:0"):
history = model.fit(images, imagesID, epochs=50, validation_split=0.2)
错误:
2021-03-14 15:07:16.145096: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-03-14 15:07:16.145335: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-03-14 15:07:16.146411: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-03-14 15:07:16.146595: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-03-14 15:07:16.146845: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops_fused_impl.h:697 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
所以我在网上找到了这个:https://github.com/tensorflow/tensorflow/issues/45779
因此,我在顶部实现了这段代码来限制 GPU 内存:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
错误:
Physical devices cannot be modified after being initialized
所以我找到了这个:https://github.com/tensorflow/tensorflow/issues/25138
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
但我仍然有同样的错误:
2021-03-14 15:07:16.145096: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
...
我完全迷失了,因为我对 Tensorflow-GPU 错误一无所知...
所有日志的详细信息在这里:https://pastebin.com/Xtsv3mLe
我不太擅长写帖子,希望我说的够清楚。
提前谢谢你!!
【问题讨论】:
-
所以我已经为 2.2.0 卸载了 2.4.1 Tensorflow GPU 这是错误:pastebin.com/XLFufzjZ 我试图更改 bin cuda 文件夹中的 lib 名称,但仍然出现此错误。 ..
-
尝试安装与 tensorflow 相同版本的估算器。你也在使用conda吗?我通常使用 conda 安装 tensorflow 2.4 和 pip 其他一切。 conda install cudatoolkit 很棒。你可以选择版本,你可以做 conda install cudnn ,这一切都很好。这篇文章可能会帮助你towardsdatascience.com/…查看它我认为它包含你需要的所有信息
标签: python tensorflow gpu