【问题标题】:Cudnn making error in initialization. ErrorCudnn 在初始化时出错。错误
【发布时间】:2020-03-19 12:55:45
【问题描述】:

我正在使用 蟒蛇,张量流GPU 2.0 显卡:NVIDIA RTX 2070 Python 版本 3.6.9 库达:cuda_10.0.130_411.31_win10 CuDnn:cudnn-10.0-windows10-x64-v7.6.5.32 使用笔记本电脑操作系统 WINDOWS 10。

你好,当我运行这段代码时,

EPOCHS = 5

for epoch in range(EPOCHS):
  for images, labels in train_ds:
    train_step(images, labels)

  for test_images, test_labels in test_ds:
    test_step(test_images, test_labels)

  template = 'epoch: {}, loss: {}, acc: {}, test loss: {}, test acc: {}'
  print (template.format(epoch+1,
                         train_loss.result(),
                         train_accuracy.result()*100,
                         test_loss.result(),
                         test_accuracy.result()*100))

上面写着:

---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-21-fb8a7b9e2d15> in <module>
      3 for epoch in range(EPOCHS):
      4   for images, labels in train_ds:
----> 5     train_step(images, labels)
      6 
      7   for test_images, test_labels in test_ds:

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\def_function.py in __call__(self, *args, **kwds)
    455 
    456     tracing_count = self._get_tracing_count()
--> 457     result = self._call(*args, **kwds)
    458     if tracing_count == self._get_tracing_count():
    459       self._call_counter.called_without_tracing()

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds)
    485       # In this case we have created variables on the first call, so we run the
    486       # defunned version which is guaranteed to never create variables.
--> 487       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    488     elif self._stateful_fn is not None:
    489       # Release the lock early so that multiple threads can perform the call

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\function.py in __call__(self, *args, **kwargs)
   1821     """Calls a graph function specialized to the inputs."""
   1822     graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   1824 
   1825   @property

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs)
   1139          if isinstance(t, (ops.Tensor,
   1140                            resource_variable_ops.BaseResourceVariable))),
-> 1141         self.captured_inputs)
   1142 
   1143   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1222     if executing_eagerly:
   1223       flat_outputs = forward_function.call(
-> 1224           ctx, args, cancellation_manager=cancellation_manager)
   1225     else:
   1226       gradient_name = self._delayed_rewrite_functions.register()

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    509               inputs=args,
    510               attrs=("executor_type", executor_type, "config_proto", config),
--> 511               ctx=ctx)
    512         else:
    513           outputs = execute.execute_with_cancellation(

~\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

~\Anaconda3\envs\DL\lib\site-packages\six.py in raise_from(value, from_value)

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node my_model_1/conv2d_1/Conv2D (defined at C:\Users\hojun\Anaconda3\envs\DL\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_train_step_1276]

Function call stack:
train_step

我尝试卸载并安装整个 Nvidia GPU 的东西(驱动程序、cuda 和 cudnn 的不同版本等等),但没有任何效果。 Tensorflow 2.0 2.1 都试过了,但都一样。 有人有解决这个问题的想法吗?

谢谢。

【问题讨论】:

  • 这是一个类似的问题,希望 Solution 能解决您的问题。

标签: tensorflow gpu


【解决方案1】:

根据 TF Site 测试的 Cuda 和 CuDnn 组合,您可以尝试安装 CuDnn 7.4 看看是否可行,希望对您有所帮助!

【讨论】:

  • 我也已经尝试了所有这些组合。可悲的是它还没有工作.. :(
  • 您尝试过降级到 Cuda 9 和 CuDnn 7 吗?如果这不起作用,为什么不尝试升级到 Python 3.7,看看是否能解决问题
猜你喜欢
  • 1970-01-01
  • 2022-01-25
  • 2018-12-16
  • 1970-01-01
  • 1970-01-01
  • 2016-09-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多