Tensorflow：在 cpu 中使用在 CUDNNLSTM 中训练的模型答案

【问题标题】：Tensorflow: Use model trained in CUDNNLSTM in cpuTensorflow：在 cpu 中使用在 CUDNNLSTM 中训练的模型
【发布时间】：2021-01-11 22:01:19
【问题描述】：

我已经使用 GPU 在 tensorflow 中使用 CUDNNLSTM 训练了一个模型。当我尝试在 cpu 中使用模型进行推理时，出现此错误：

Invalid argument: No OpKernel was registered to support Op 'CudnnRNN' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

     [[Node: cudnn_lstm/CudnnRNN = CudnnRNN[T=DT_FLOAT, direction="bidirectional", dropout=0, input_mode="linear_input", is_training=false, rnn_mode="lstm", seed=87654321, seed2=4567](Reshape_1, cudnn_lstm/zeros, cudnn_lstm/zeros_1, cudnn_lstm/opaque_kernel/read)]]

那么，我们如何在 cpu 中使用这个模型呢？

【问题讨论】：

看来您需要在 GPU 上运行 CUDNN。请参阅stackoverflow.com/questions/48086014/… 可能您可以只保存训练模型中的权重，然后将其读入？
Keras Model With CuDNNLSTM Layers Doesn't Work on Production Server的可能重复

标签： python tensorflow

【解决方案1】：

请查看 CuDNN LSTM 层的 tensorflow 源代码中的 cmets：https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py

他们从第 83 行开始描述了如何做，你想要什么。基本上，在使用 CuDNN 层进行训练之后，您需要将权重转移到使用 CuDNN Compatible LSTM Cells 制作的模型上。这样的模型将在 CPU 和 GPU 上运行。另外，据我所知，tensorflow 中的 CuDNN LSTM 层是时间主要的，所以不要忘记转置您的输入（我在最新的 tensorflow 版本中不确定这一点，请确认）。

对于基于上述的简短完整示例，请查看 melgor 的要点：

https://gist.github.com/melgor/41e7d9367410b71dfddc33db34cba85f?short_path=29ebfc6

【讨论】：

你知道如何使用 Keras 层吗？
请看@Yu-Yang 和fchollet github.com/keras-team/keras/pull/8307 的这个讨论，看来确实有问题，适当的更改已合并到最新的keras 分支。如果更新 keras 后，它仍然不适合你，你可以做两件事之一。 1. 将更改（在上面的 PR 中突出显示）复制到本地 Keras 文件。或 2. 从同一链接获取整个更新的 Keras 存储库。希望对您有所帮助！
@OP 如果解决了您的问题，请接受答案。
@Antimony，对此我很抱歉，也许我会尝试检查您的代码。但是我要求您向任何回答的观众明确说明您的问题与 keras 有关，而不是 OP 的要求。 OP 询问如何在 tensorflow 中执行此操作，我知道它有效，这就是执行此操作的方法。

【解决方案2】：

Reason: tensorflow doesn`t see your GPU

修复：安装 CUDA Toolkit 和 cuDNN SDK（与您的 tf 版本兼容），运行：'pip uninstall tensorflow'； 'pip install tensorflow-gpu'

Summary:
    1. check if tensorflow sees your GPU (optional)
    2. check if your videocard can work with tensorflow (optional)
    3. find versions of CUDA Toolkit and cuDNN SDK, compatible with your tf version
        (https://www.tensorflow.org/install/source#linux)
    4. install CUDA Toolkit
        (https://developer.nvidia.com/cuda-toolkit-archive)
    5. install cuDNN SDK 
        (https://developer.nvidia.com/rdp/cudnn-archive)
    6. pip uninstall tensorflow; pip install tensorflow-gpu 
    7. check if tensorflow sees your GPU
    * source - https://www.tensorflow.org/install/gpu


Detailed instruction:
    1. check if tensorflow sees your GPU (optional)
        from tensorflow.python.client import device_lib
        def get_available_devices():
            local_device_protos = device_lib.list_local_devices()
            return [x.name for x in local_device_protos]
        print(get_available_devices()) 
        # my output was => ['/device:CPU:0']
        # good output must be => ['/device:CPU:0', '/device:GPU:0']
    2. check if your card can work with tensorflow (optional)
        * my PC: GeForce GTX 1060 notebook (driver version - 419.35), windows 10, jupyter notebook
        * tensorflow needs Compute Capability 3.5 or higher. (https://www.tensorflow.org/install/gpu#hardware_requirements)
        - https://developer.nvidia.com/cuda-gpus
        - select "CUDA-Enabled GeForce Products"
        - result - "GeForce GTX 1060    Compute Capability = 6.1"
        - my card can work with tf!
    3. find versions of CUDA Toolkit and cuDNN SDK, that you need
        a) find your tf version
            import tensorflow as tf
            print(tf.__version__)
            # my output was => 1.13.1
        b) find right versions of CUDA Toolkit and cuDNN SDK for your tf version
            https://www.tensorflow.org/install/source#linux
            * it is written for linux, but worked in my case
            see, that tensorflow_gpu-1.13.1 needs: CUDA Toolkit v10.0, cuDNN SDK v7.4
    4. install CUDA Toolkit
        a) install CUDA Toolkit 10.0
            https://developer.nvidia.com/cuda-toolkit-archive
            select: CUDA Toolkit 10.0 and download base installer (2 GB)
            installation settings: select only CUDA
                (my installation path was: D:\Programs\x64\Nvidia\Cuda_v_10_0\Development)
        b) add environment variables:
            system variables / path must have:
                D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\bin
                D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\libnvvp
                D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\extras\CUPTI\libx64
                D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\include
    5. install cuDNN SDK
        a) download cuDNN SDK v7.4
            https://developer.nvidia.com/rdp/cudnn-archive (needs registration, but it is simple)
            select "Download cuDNN v7.4.2 (Dec 14, 2018), for CUDA 10.0"
        b) add path to 'bin' folder into "environment variables / system variables / path":
            D:\Programs\x64\Nvidia\cudnn_for_cuda_10_0\bin
    6.  pip uninstall tensorflow
        pip install tensorflow-gpu 
    7. check if tensorflow sees your GPU
        restart your PC
        print(get_available_devices()) 
        # now this code should return => ['/device:CPU:0', '/device:GPU:0']

【讨论】：

【解决方案3】：

这不起作用的原因是您的 JSON 文件（具有您的模型架构）仍针对 CuDNNLSTM 进行了配置。 Keras 现在可以自动将 CuDNNLSTM 权重加载到 LSTM 架构中，但它不会自动为您更改架构。

解决方法很简单：打开您的 .json 文件，并将每个 CuDNNLSTM 实例更改为 LSTM。保存 JSON 文件，然后您应该能够从 .h5 文件中加载权重。

【讨论】：