首先你应该知道,解决视频分类任务的方法比LSTM或任何RNN Cell更适合卷积RNN,就像CNN 比 MLP
更适合图像分类任务
那些 RNN 单元(例如 LSTM、GRU)预计输入形状为 (samples, timesteps, channels),因为您处理的输入形状为 (samples, timesteps, width, height, channels),因此您应该改用 tf.keras.layers.ConvLSTM2D
以下示例代码将向您展示如何构建一个可以处理您的视频分类任务的模型:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
[
layers.Input(
shape=(timesteps, width, height, channels)
),
layers.ConvLSTM2D(
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool2D(
pool_size=(2, 2), strides=(2, 2), padding="same"
),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16) 64
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 256) 2359552
_________________________________________________________________
dense_1 (Dense) (None, 5) 1285
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
请注意,在输入上述模型之前,您应该将数据重新排序为形状(samples, timesteps, width, height, channels)(即不像np.reshape,而是像np.moveaxis),在您的情况下,形状应该是(120, 60, 192, 192, 1),然后您可以拆分你的 120 视频批量和饲料模型