让keras LSTM层接受两个输入？答案

【问题标题】：Getting keras LSTM layer to accept two inputs?让keras LSTM层接受两个输入？
【发布时间】：2018-05-09 17:35:46
【问题描述】：

我正在处理最大长度为 50 的填充序列。我有两种类型的序列数据：

1) 对应于事件类型的整数序列 seq1 (1-100)（例如 [3,6,3,1,45,45....3]

2) 一个整数序列 seq2，表示从 seq1 中的最后一个事件开始的时间，以分钟为单位。因此，根据定义，最后一个元素为零。例如 [100, 96, 96, 45, 44, 12,... 0]。 seq1 和 seq2 长度相同，均为 50。

我试图主要在 event/seq1 数据上运行 LSTM，但 time/seq2 强烈影响 LSTM 中的遗忘门。这样做的原因是我希望 LSTM 能够真正惩罚旧事件并且更有可能忘记它们。我正在考虑将遗忘权重乘以 time/seq2 序列的当前值的倒数。或者可能是 (1/seq2_element + 1)，以处理零分钟的情况。

我在 keras 代码（LSTMCell 类）中看到了必须进行更改的地方：

f = self.recurrent_activation(x_f + K.dot(h_tm1_f,self.recurrent_kernel_f))

所以我需要修改 keras 的 LSTM 代码以接受多个输入。作为初始测试，在 LSTMCell 类中，我将调用函数更改为如下所示：

 def call(self, inputs, states, training=None):
        time_input = inputs[1]
        inputs = inputs[0]

这样它就可以处理以列表形式给出的两个输入。

当我尝试使用功能 API 运行模型时：

# Input 1: event type sequences
# Take the event integer sequences, run them through an embedding layer to get float vectors, then run through LSTM
main_input = Input(shape =(max_seq_length,), dtype = 'int32', name = 'main_input')
x = Embedding(output_dim = embedding_length, input_dim = num_unique_event_symbols, input_length = max_seq_length, mask_zero=True)(main_input)

## Input 2: time vectors 
auxiliary_input = Input(shape=(max_seq_length,1), dtype='float32', name='aux_input')
m = Masking(mask_value = 99999999.0)(auxiliary_input)

lstm_out = LSTM(32)(x, time_vector = m)

# Auxiliary loss here from first input
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

# An abitrary number of dense, hidden layers here
x = Dense(64, activation='relu')(lstm_out)

# The main output node
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

## Compile and fit the model
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'], loss_weights=[1., 0.2])
print(model.summary())
np.random.seed(21)
model.fit([train_X1, train_X2], [train_Y, train_Y], epochs=1, batch_size=200)

但是，我收到以下错误：

An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 50, 1), ndim=3)]; however `cell.state_size` is (32, 32)

有什么建议吗？

【问题讨论】：

LSTM 之前的层和 LSTM 层之间似乎断开了连接。 LSTM之前的层是什么？你试过不修改 Keras 代码吗？
好的，我添加了模型的完整代码。在我修改 Keras 代码之前它肯定有效（当我只将“x”输入 LSTM 时）

标签： tensorflow machine-learning deep-learning keras lstm

【解决方案1】：

您不能将输入列表传递给 Keras 中的默认循环层。 input_spec 是固定的，循环代码是基于documentation 中指出的单个张量输入实现的，即它不会神奇地迭代相同时间步长的 2 个输入并将其传递给单元格。这部分是因为如何优化迭代以及如果网络展开等假设。

如果您喜欢 2 个输入，您可以将 constants (doc) 传递给将按原样传递张量的单元格。这主要是为了在未来实现注意力模型。因此，1 个输入将迭代时间步长，而另一个则不会。如果你真的喜欢像 python 中的 zip() 那样迭代 2 个输入，你将不得不实现一个自定义层。

【讨论】：

啊，我明白了。好吧，似乎没有办法解决这个问题，需要制作那个自定义层。你有没有遇到过这样实现自定义 RNN 层的示例？您认为在 TF 或 Keras 中实现这一点会更容易吗？
不靠谱的解决方案是将两个时间序列连接成一个（batch_size、timesteps、features1 + features2）并创建一个知道索引的自定义单元格（您可以在构造函数中设置它）。现在自定义单元格将获得一个输入，但您可以将其分成 2 个。这样您就可以在 Keras 中使用 RNN 逻辑。
我喜欢这个主意！它似乎也相对可行。如果不是太麻烦，您能否建议 LSTMCell 代码中的内容：github.com/keras-team/keras/blob/…

【解决方案2】：

我想在这里提出不同的想法。它们不需要您修改 Keras 代码。

在事件类型的嵌入层之后，将嵌入与经过的时间堆叠在一起。 Keras 函数是keras.layers.Concatenate(axis=-1)。想象一下，单个偶数类型通过嵌入层映射到n 维向量。您只需在嵌入后将经过时间添加为一个维度，使其成为n+1 向量。

另一个想法，与您的问题/问题有关，可能在这里有所帮助，是一维卷积。卷积可以在连接嵌入之后立即发生。将卷积应用于事件类型和经过时间的直觉实际上是 1x1 卷积。以这样一种方式，您可以将两者线性组合在一起并训练参数。请注意，在卷积方面，向量的维度称为通道。当然，您也可以一次卷积多个事件。就试一试吧。它可能有帮助，也可能没有帮助。

【讨论】：