运行 TensorFlow AI 时出现 GPU 错误答案

【问题标题】：GPU errors when running tensorflow AI运行 TensorFlow AI 时出现 GPU 错误
【发布时间】：2021-04-18 22:00:00
【问题描述】：

我正在学习一个初学者的 TensorFlow 教程并尝试分类。有一堆GPU错误。我安装了 cuda 工具以及我最新的 GPU 驱动程序。这是输出：

2021-01-13 15:42:24.186914: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库“cudart64_110.dll”； dlerror: cudart64_110.dll 未找到 2021-01-13 15:42:24.187065: 我 tensorflow/stream_executor/cuda/cudart_stub.cc:29] 忽略上面的 cudart 如果您的机器上没有设置 GPU，则会出现 dlerror。 [NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)] 2021-01-13 15:42:26.282013: 我 tensorflow/stream_executor/platform/default/dso_loader.cc:49] 成功打开动态库nvcuda.dll 2021-01-13 15:42:26.302224：我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] 找到设备 0 具有属性：pciBusID：0000:0e:00.0 名称：GeForce GTX 1080 计算能力：6.1 coreClock：1.86GHz coreCount：20 deviceMemorySize：8.00GiB deviceMemoryBandwidth：298.32GiB/s 2021-01-13 15:42:26.302958: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库“cudart64_110.dll”； dlerror: cudart64_110.dll 未找到 2021-01-13 15:42:26.303513: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库“cublas64_11.dll”； dlerror：cublas64_11.dll 未找到 2021-01-13 15:42:26.304062: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库“cublasLt64_11.dll”；错误：开始训练时未找到 cublasLt64_11.dll 2021-01-13 15:42:26.307161：我 tensorflow/stream_executor/platform/default/dso_loader.cc:49] 成功打开动态库cufft64_10.dll 2021-01-13 15:42:26.308219：我 tensorflow/stream_executor/platform/default/dso_loader.cc:49] 成功打开动态库curand64_10.dll 2021-01-13 15:42:26.312354：我 tensorflow/stream_executor/platform/default/dso_loader.cc:49] 成功打开动态库cusolver64_10.dll 2021-01-13 15：42：26.312941：W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库'cusparse64_11.dll'；错误：未找到 cusparse64_11.dll 2021-01-13 15:42:26.313499: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 可以不加载动态库'cudnn64_8.dll'； dlerror：cudnn64_8.dll 不是发现 2021-01-13 15:42:26.313623: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] 无法 dlopen 一些 GPU 库。请确保提到的缺失库如果您想使用 GPU，以上已正确安装。跟着在https://www.tensorflow.org/install/gpu 指导如何下载并为您的平台设置所需的库。跳过注册 GPU 设备... 2021-01-13 15:42:26.314323: 我 tensorflow/core/platform/cpu_feature_guard.cc:142] 这个 TensorFlow 二进制使用 oneAPI 深度神经网络库 (oneDNN) 进行了优化在性能关键型中使用以下 CPU 指令操作：AVX2 要在其他操作中启用它们，请重新构建具有适当编译器标志的 TensorFlow。 2021-01-13 15:42:26.315481：我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] 设备用强度 1 边缘矩阵互连 StreamExecutor：2021-01-13 15:42:26.315604：我张量流/核心/common_runtime/gpu/gpu_device.cc:1306]
警告：tensorflow：使用临时文件夹作为模型目录： C:\Users\levig\AppData\Local\Temp\tmpbmbc3as1 警告：tensorflow：来自 C:\Users\levig\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\training\training_util.py:235： Variable.initialized_value（来自 tensorflow.python.ops.variables）是已弃用，并将在未来的版本中删除。说明更新：使用 Variable.read_value。 2.X中的变量被初始化在渴望和图形（在 tf.defun 内部）上下文中自动。警告：张量流：从 C:\Users\levig\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\optimizer_v2\adagrad.py:82：调用 Constant.init（来自 tensorflow.python.ops.init_ops） dtype 已弃用，将在未来版本中删除。更新说明：使用 dtype 调用初始化程序实例参数而不是将其传递给构造函数 2021-01-13 15:42:27.410575：我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] 找到设备 0 具有属性：pciBusID：0000:0e:00.0 名称：GeForce GTX 1080 计算能力：6.1 coreClock：1.86GHz coreCount：20 deviceMemorySize：8.00GiB deviceMemoryBandwidth：298.32GiB/s 2021-01-13 15:42:27.410786: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] 无法 dlopen 一些 GPU 库。请确保提到的缺失库如果您想使用 GPU，以上已正确安装。跟着指导https://www.tensorflow.org/install/gpu了解如何下载并为您的平台设置所需的库。跳过注册 GPU 设备... 2021-01-13 15:42:27.474456: 我 tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] 设备用强度 1 边缘矩阵互连 StreamExecutor：2021-01-13 15:42:27.474571：我张量流/核心/common_runtime/gpu/gpu_device.cc:1306] 0 2021-01-13 15:42:27.474637: 我张量流/核心/common_runtime/gpu/gpu_device.cc:1319] 0：N 2021-01-13 15:42:27.482654: 我 tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:258] 没有启用 MLIR 优化通道（注册 0 通道）

这是我的代码：

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf

import pandas as pd
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
# Here we use keras (a module inside of TensorFlow) to grab our datasets and read them into a pandas dataframe
train_y = train.pop('Species')
test_y = test.pop('Species')
train.head() # the species column is now gone


def input_fn(features, labels, training=True, batch_size=256):
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[30, 10],
    # The model must choose between 3 classes.
    n_classes=3)

print("starting training")

classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=5000)

【问题讨论】：

正如错误所说：无法打开某些 GPU 库。如果您想使用 GPU，请确保正确安装了上述缺少的库。按照tensorflow.org/install/gpu 的指南，了解如何为您的平台下载和设置所需的库。跳过注册 GPU 设备。特别是：无法加载动态库“cudart64_110.dll”； dlerror：cudart64_110.dll。它在你的道路上吗？它是否与您的 TF 库一起提供？您可能需要安装 cuda，并创建指向 cudart64_110.dll 的符号链接。
请不要为您的问题添加答案。我已经回滚/编辑了你的问题。请改写新答案。

标签： tensorflow deep-learning

【解决方案1】：

来自 cmets

请确保已安装上述缺少的库如果您想使用 GPU，请正确使用。关注GPU Support guide，了解如何下载和设置所需的库你的平台。（转述自 Soleil）

【讨论】：