tensorflow 2.0 gpu 出现问题，未知错误答案

【问题标题】：Problem with tensorflow 2.0 gpu, unknown errortensorflow 2.0 gpu 出现问题，未知错误
【发布时间】：2020-01-25 06:05:21
【问题描述】：

我正在尝试在conda环境下在Windows 10上运行tensorflow-gpu 2.0，代码实际上是TensorFlow页面上的基础教程

from __future__ import absolute_import, division, print_function, unicode_literals 
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test,  y_test, verbose=2)

我不明白错误并且已经卸载并再次安装了可能是我还没有安装 keras-gpu 吗？我刚刚开始使用这个库请帮助:(

Epoch 1/5
2020-01-24 23:40:35.430377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-01-24 23:40:35.923375: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.933612: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.941088: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.952234: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.961783: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.970378: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-01-24 23:40:35.976378: W tensorflow/stream_executor/stream.cc:1919] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-01-24 23:40:35.986426: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[{{node sequential/dense/MatMul}}]]
   32/60000 [..............................] - ETA: 2:37:06Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
    ctx=ctx)
  File "C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError:  Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
         [[node sequential/dense/MatMul (defined at C:\Users\igorr_z1q8wib\.conda\envs\tf_gpu\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_distributed_function_706]

Function call stack:
distributed_function

>>>
>>> model.evaluate(x_test,  y_test, verbose=2)
2020-01-24 23:40:36.878248: I tensorflow/stream_executor/stream.cc:1868] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not wait for [stream=000002DA3ACFD9A0,impl=000002DA3B9C7F70]
2020-01-24 23:40:36.892612: I tensorflow/stream_executor/stream.cc:4816] [stream=000002DA3ACFDB20,impl=000002DA3B9C8060] did not memcpy host-to-device; source: 000002DAA3AF8C80
2020-01-24 23:40:36.901014: F tensorflow/core/common_runtime/gpu/gpu_util.cc:342] CPU->GPU Memcpy failed```

【问题讨论】：

尝试更新 conda 环境的所有包。使用conda update --all。

标签： python tensorflow keras anaconda conda

【解决方案1】：

Igor，你在设置 GPU 设备吗？

https://devblogs.nvidia.com/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/

https://www.tensorflow.org/guide/gpu

看看你有什么设备：

from \__future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))





## it's possible to set the device manually

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    print(c)

所以你的代码会是这样的：

with tf.device('/CPU:0'):
    mnist = tf.keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(10, activation='softmax')
        ])

    model.compile(optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=5)

    model.evaluate(x_test,  y_test, verbose=2)

【讨论】：

非常感谢！正如您所建议的，看起来我错过了设备放置。如果我尝试将“with tf.device('/CPU:0'):”更改为“with tf.device('/GPU:0'):”会自动选择 Nvidia GPU？还是英特尔 GPU？
嗯，这将取决于您的机器，具体取决于您的系统构建方式。也许查看这篇其他帖子以了解如何检查 stackoverflow.com/questions/38559755/… 是什么设备。一旦你知道你喜欢什么设备号，把它代入代码（'/CPU:0'）或（'/CPU:1'），或者任何有意义的东西