【问题标题】:Failed to get convolution algorithm获取卷积算法失败
【发布时间】:2023-10-21 12:04:02
【问题描述】:

我正在尝试训练一个卷积网络,我将省略为图片和类导入和创建文件夹的部分代码,现在这是网络的主要代码,这个例子取自《深度学习》一书在 python 中”来自 François Collet

from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1177, activation='softmax'))

from keras import optimizers
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(NEW_DATA_DIR, target_size=(150, 150), batch_size=10,class_mode='categorical')

history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=30)

然后我得到这个错误

WARNING:tensorflow:From /home/mimus/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/30

---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-7-380fa4b50355> in <module>
      2 train_generator,
      3 steps_per_epoch=100,
----> 4 epochs=30)

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1437           ret = tf_session.TF_SessionRunCallable(
   1438               self._session._session, self._handle, args, status,
-> 1439               run_metadata_ptr)
   1440         if run_metadata:
   1441           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    526             None, None,
    527             compat.as_text(c_api.TF_Message(self.status.status)),
--> 528             c_api.TF_GetCode(self.status.status))
    529     # Delete the underlying status object from memory otherwise it stays alive
    530     # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d_1/convolution}}]]
     [[{{node loss/mul}}]]

我使用 python 3.6、tensorflowgpu 1.13.1、kerasgpu 2.2.4、cudatoolkit 10.0.130、cudnn 7.3.1、numpy 1.16.5。我真的希望你能帮我解决这个问题,提前谢谢你

编辑:

使用 nvidia-smi 我明白了

NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   52C    P8    19W / 170W |   5895MiB /  5931MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1305      G   /usr/lib/xorg/Xorg                            20MiB |
|    0      1356      G   /usr/bin/gnome-shell                          51MiB |
|    0      1556      G   /usr/lib/xorg/Xorg                           123MiB |
|    0      1703      G   /usr/bin/gnome-shell                          74MiB |
|    0      2816      G   /usr/lib/firefox/firefox                       3MiB |
|    0     10133      G   /usr/lib/firefox/firefox                       3MiB |
|    0     18578      C   .../mimus/anaconda3/envs/tf_gpu/bin/python  5605MiB |
+-----------------------------------------------------------------------------+

【问题讨论】:

  • 您是否通过在终端中运行nvidia-smi 得到任何输出?那里显示的是什么驱动程序版本?
  • 驱动程序是驱动程序版本:430.26 我用 nvidia-smi 信息编辑帖子

标签: python tensorflow keras deep-learning


【解决方案1】:

问题是我的 gpu 没有足够的内存来应对网络的复杂性,这可以通过尝试更小的模型来解决,谢谢大家

【讨论】: