InternalError（参见上面的回溯）：Blas GEMM 启动失败答案

【问题标题】：InternalError (see above for traceback): Blas GEMM launch failedInternalError（参见上面的回溯）：Blas GEMM 启动失败
【发布时间】：2017-11-01 01:49:16
【问题描述】：

我用 keras 训练了一个模型并想对其进行评估。但我总是得到这个错误。我在这里找到了解决方案TensorFlow: InternalError: Blas SGEMM launch failed 但这仅适用于张量流。

Using TensorFlow backend.
2017-11-01 10:40:49.120525: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-01 10:40:49.120546: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-01 10:40:49.120553: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-01 10:40:49.120557: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-01 10:40:49.120562: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-11-01 10:40:49.266103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-11-01 10:40:49.266511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce 940MX
major: 5 minor: 0 memoryClockRate (GHz) 1.189
pciBusID 0000:01:00.0
Total memory: 1.96GiB
Free memory: 1.78GiB
2017-11-01 10:40:49.266528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-11-01 10:40:49.266534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-11-01 10:40:49.266542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
2017-11-01 10:40:54.162805: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-11-01 10:40:54.162825: W tensorflow/stream_executor/stream.cc:1756] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/home/viktor/PycharmProjects/ProjectSSD/test.py", line 39, in <module>
    scores = model.evaluate(x_test_bin, y_test, verbose=1)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/models.py", line 896, in evaluate
    sample_weight=sample_weight)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1657, in evaluate
    steps=steps)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1339, in _test_loop
    batch_outs = f(ins_batch)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(32, 2304), b.shape=(2304, 512), m=32, n=512, k=2304
     [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]

Caused by op u'dense_1/MatMul', defined at:
  File "/home/viktor/PycharmProjects/ProjectSSD/test.py", line 13, in <module>
    model = load_model(save_dir + '/' + model_name)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/models.py", line 239, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/models.py", line 313, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/layers/__init__.py", line 54, in deserialize
    printable_module_name='layer')
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/models.py", line 1214, in from_config
    model.add(layer)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/layers/core.py", line 841, in call
    output = K.dot(inputs, self.kernel)
  File "/home/viktor/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 998, in dot
    out = tf.matmul(x, y)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1844, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1289, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(32, 2304), b.shape=(2304, 512), m=32, n=512, k=2304
     [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](flatten_1/Reshape, dense_1/kernel/read)]]

这里是代码：首先加载模型。加载数据集。为评估准备数据。然后进行评估。

from __future__ import print_function
import keras
from keras.datasets import cifar10
from keras.models import load_model
import numpy as np

import os

num_classes = 10
save_dir = os.path.join(os.getcwd(), 'examples/saved_models')
model_name = 'keras_cifar10_trained_model.h5'
model = load_model(save_dir + '/' + model_name)

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train_float = x_train.astype('float32')
x_test_float = x_test.astype('float32')
x_train_bin = x_train_float / 255
x_test_bin = x_test_float / 255

# Score trained model.
scores = model.evaluate(x_test_bin, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

我发现人们会遇到同样的问题，如果他们同时运行两个会话。当行

model = load_model(save_dir + '/' + model_name)

通过，GPU 使用率增加。（观看-n 0.5 nvidia-smi）也许这就是问题所在？

【问题讨论】：

当我的一个 Ipython Notebook 仍在 GPU 上运行或我的 GPU 内存在启动时已满时，我个人会收到此错误。
感谢您的回答。此命令后内存已满model = load_model(save_dir + '/' + model_name)，当它运行时scores = model.evaluate(x_test_bin, y_test, verbose=1) 崩溃
然后尝试使用批次而不是所有数据来评估您的模型
尝试运行这个例如：scores = model.evaluate(x_test_bin[:10], y_test[:10], verbose=1)，你仍然得到错误吗？
即使我只想预测一个示例 image_bin = np.expand_dims(x_test_bin[0], axis=0) y_predict = model.predict(image_bin, verbose=1)，我也会遇到 scores = model.evaluate(x_test_bin[:10], y_test[:10], verbose=1) 的相同错误。 scores = model.evaluate(x_test_bin, y_test, batch_size=32, verbose=1) 也不起作用

标签： python machine-learning tensorflow neural-network keras

【解决方案1】：

安东尼·达马托，对不起，我浪费了你的时间。

错误来自代码中的一部分，与 cv2. 我打开了一个新问题。

cv2, keras, InternalError (see above for traceback): Blas GEMM launch failed

非常感谢。你帮助我更接近解决方案

【讨论】：