将 CNN 特征馈送到 LSTM答案

【问题标题】：Feed CNN features to LSTM将 CNN 特征馈送到 LSTM
【发布时间】：2017-04-28 13:00:16
【问题描述】：

我想构建一个具有以下特性的端到端可训练模型：

CNN 从图像中提取特征
特征被重塑为矩阵
然后将该矩阵的每一行输入到 LSTM1
然后将该矩阵的每一列馈送到 LSTM2
LSTM1 和 LSTM2 的输出连接起来作为最终输出

（和本文的图2差不多：https://arxiv.org/pdf/1611.07890.pdf）

我现在的问题是在重塑之后，如何使用 Keras 或 Tensorflow 将特征矩阵的值提供给 LSTM？

这是我目前使用 VGG16 网络的代码（也是Keras issues 的链接）：

# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))

# reshape the  feature 4096 = 64 * 64
model.add(Reshape((64, 64)))

# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct: 
# model.add(LSTM(256, input_shape=(64, 1)))  # 256 hidden units, sequence length = 64, feature dim = 1

【问题讨论】：

这行得通吗？

标签： tensorflow keras lstm

【解决方案1】：

考虑使用 Conv2D 和 MaxPool2D 层构建 CNN 模型，直到到达 Flatten 层，因为 Flatten 层的矢量化输出将是您将数据输入到结构的 LSTM 部分。

所以，像这样构建你的 CNN 模型：

model_cnn = Sequential()
model_cnn.add(Conv2D...)
model_cnn.add(MaxPooling2D...)
...
model_cnn.add(Flatten())

现在，这是一个有趣的点，当前版本的 Keras 与一些 TensorFlow 结构有些不兼容，不允许您将整个层堆叠在一个 Sequential 对象中。

所以是时候使用 Keras 模型对象来完成你的神经网络了：

input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)

最后，现在是时候将我们的 2 个独立模型组合在一起了：

model = Model(inputs=[input_lay], outputs=[output_lay])
model.compile(...)

OBS：请注意，一旦 VGG Flatten 层的矢量化输出将成为 LSTM 模型的输入，您可以用 VGG 替换我的 model_cnn 示例而不包括顶层。

【讨论】：