lstm之后的TimeDistributed（Dense（））与Dense（）答案

【问题标题】：TimeDistributed(Dense()) vs Dense() after lstmlstm之后的TimeDistributed（Dense（））与Dense（）
【发布时间】：2022-06-10 17:34:56
【问题描述】：

input_word = Input(shape=(max_len,))
model = Embedding(input_dim=num_words, output_dim=50, input_length=max_len)(input_word)
model = SpatialDropout1D(0.1)(model)
model = Bidirectional(LSTM(units=100, return_sequences=True, recurrent_dropout=0.1))(model)
out = TimeDistributed(Dense(num_tags, activation="softmax"))(model)
#out = Dense(num_tags, activation="softmax")(model)
model = Model(input_word, out)
model.summary()

当我只使用密集层或使用 TimeDistributed 时，我得到了相同的结果。在什么情况下我应该使用 TimeDistributed？

【问题讨论】：

标签： python keras nlp lstm tensorflow2.0

【解决方案1】：

TimeDistributed 仅对于在其实现中无法处理额外维度的某些层是必需的。例如。 MaxPool2D 仅适用于 2D 张量（形状 batch x width x height x channels），如果您添加时间维度，则会崩溃：

tfkl = tf.keras.layers
a = tf.random.normal((16, 32, 32, 3))
tfkl.MaxPool2D()(a)  # this works

a = tf.random.normal((16, 5, 32, 32, 3))  # added a 5th dimension
tfkl.MaxPool2D()(a)  # this will crash

在这里，添加TimeDistributed 将修复它：

tfkl.TimeDistributed(tfkl.MaxPool2D())(a)  # works with a being 5d!

但是，许多层已经支持任意输入形状，并且会自动在这些维度上分配计算。其中之一是Dense - 它始终应用于您输入中的最后一个轴并分布在所有其他轴上，因此不需要TimeDistributed。事实上，正如您所指出的，它对输出没有任何改变。

不过，它可能会改变计算的完成方式。我不确定这一点，但我敢打赌不使用TimeDistributed 并依靠Dense 实现本身可能更有效。

【讨论】：