使用 TimeDistributed 层在 Keras 中构建有状态 RNN 时出错答案

【问题标题】：Error when constructing a stateful RNN in Keras using TimeDistributed layer使用 TimeDistributed 层在 Keras 中构建有状态 RNN 时出错
【发布时间】：2020-10-16 23:44:35
【问题描述】：

我目前尝试使用带有 TensorFlow 后端的 Keras 中的论文（类似于 https://arxiv.org/pdf/1605.00894.pdf，第 5 页）重新创建模型，但在构建具有 TimeDistributed 层的有状态 RNN 时似乎卡住了开始。

这是有问题的部分的精简代码：

model = Sequential()
model.add(TimeDistributed(
    Conv2D(
        filters=40,
        kernel_size=(3,3),
        padding="same",
        data_format="channels_last"
    ),
    name="C1",
    batch_input_shape=(1, None, 80, 128, 3)
))

# (leaving out the pooling)

model.add(ConvLSTM2D(
    filters=10,
    kernel_size=(3,3),
    padding="same",
    return_sequences=True,
    stateful=True
))

ConvLSTM2D层中stateful参数设置为true时，出现如下错误信息：

ValueError: If a RNN is stateful, it needs to know its batch size. Specify the batch size of your input tensors: 
- If using a Sequential model, specify the batch size by passing a `batch_input_shape` argument to your first layer.
- If using the functional API, specify the time dimension by passing a `batch_shape` argument to your Input layer.
The same thing goes for the number of rows and columns.

我也尝试添加以下输入层，但这似乎没有帮助：

model.add(Input(
    name="input",
    batch_input_shape=(1, None, 80, 128, 3)
))

另外，我使用功能 API 构建了相同的模型（使用 batch_shape 而不是 batch_input_shape），但我收到了相同的错误消息。

我已阅读有关此主题的一些主题，但似乎没有一个解决方案对我有用。也许我只是忽略了一些明显的错误......

【问题讨论】：

标签： python tensorflow keras recurrent-neural-network stateful

【解决方案1】：

我尝试将 batch_input_shape 移动到 Conv 层的内部并使用构建命令添加 input_shape。

model.add(keras.layers.TimeDistributed(
    keras.layers.Conv2D(
        filters=40,
        kernel_size=(3,3),
        padding="same",
        data_format="channels_last",
        batch_input_shape=(1, None, 80, 128, 3)
    ),
    name="C1",
))
#.. After other commands
model.build(input_shape=(1,1,80,128,3))

请注意，我不确定输入形状的正确性。我只是放置有助于构建网络的形状，而不是引发您面临的错误。这段代码给出了生成的网络

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
C1 (TimeDistributed)         multiple                  1120
_________________________________________________________________
conv_lst_m2d (ConvLSTM2D)    multiple                  18040
=================================================================
Total params: 19,160
Trainable params: 19,160
Non-trainable params: 0
_________________________________________________________________

【讨论】：

谢谢，确实是这样，但不完全。当 build() 中的输入形状为 (1,1,80,128,3) 时，它会编译，但输入形状为 (1,None,80,128) 时不会编译。然后有同样的错误信息。有什么办法可以解决吗？
老实说，我不确定。我尝试使用(1,None,80,128,3) 输入形状构建模型，但无法解决错误。祝你好运找到答案。