检查目标时出错：预期 activation_29 的形状为 (1,)，但数组的形状为 (3,)答案

【问题标题】：Error when checking target: expected activation_29 to have shape (1,) but got array with shape (3,)检查目标时出错：预期 activation_29 的形状为 (1,)，但数组的形状为 (3,)
【发布时间】：2018-11-10 19:27:15
【问题描述】：

我正在尝试将Keras's memory neural net using the bAbI dataset 从输出单个单词修改为输出多个单词（本例中为 3 个）。对于上下文，这是一个使用 LSTM 进行问答的 NLP 模型。

这是模型结构的一个sn-p：

# placeholders
input_sequence = Input((story_maxlen,))
question = Input((query_maxlen,))

# encoders
# embed the input sequence into a sequence of vectors
input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
                              output_dim=64))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)

# embed the input into a sequence of vectors of size query_maxlen
input_encoder_c = Sequential()
input_encoder_c.add(Embedding(input_dim=vocab_size,
                              output_dim=query_maxlen))
input_encoder_c.add(Dropout(0.3))
# output: (samples, story_maxlen, query_maxlen)

# embed the question into a sequence of vectors
question_encoder = Sequential()
question_encoder.add(Embedding(input_dim=vocab_size,
                               output_dim=64,
                               input_length=query_maxlen))
question_encoder.add(Dropout(0.3))
# output: (samples, query_maxlen, embedding_dim)

# encode input sequence and questions (which are indices)
# to sequences of dense vectors
input_encoded_m = input_encoder_m(input_sequence)
input_encoded_c = input_encoder_c(input_sequence)
question_encoded = question_encoder(question)

# compute a 'match' between the first input vector sequence
# and the question vector sequence
# shape: `(samples, story_maxlen, query_maxlen)`
match = dot([input_encoded_m, question_encoded], axes=(2, 2))
match = Activation('softmax')(match)

# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c])  # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response)  # (samples, query_maxlen, story_maxlen)

# concatenate the match matrix with the question vector sequence
answer = concatenate([response, question_encoded])

# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer = LSTM(32)(answer)  # (samples, 32)

# one regularization layer -- more would probably be needed.
answer = Dropout(0.3)(answer)
answer = Dense(vocab_size)(answer)  # (samples, vocab_size)
# we output a probability distribution over the vocabulary
answer = Activation('softmax')(answer)

这就是它的编译和训练方式：

model = Model([input_sequence, question], answer)
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit([inputs_train, queries_train], answers_train,
          batch_size=32,
          epochs=num_epochs,
          validation_data=([inputs_test, queries_test], answers_test))

在上面的示例中，answers_train 变量是一个 1xn 矩阵，其中每个项目是一个问题的值。因此，例如，前三个答案：

print(answers_train[:3])

输出：

[16 16 19]

我的问题

这是我对 answer_train 变量所做的更改：

print(answers_train[:3])

输出：

[[ 0  0 16]
 [ 0  0 27]
 [ 0  0 16]]

基本上，我试图预测最多三个单词而不是一个。

当我这样做并尝试训练模型时，我得到了这个错误：

ValueError：检查目标时出错：预期activation_29 有形状 (1,) 但得到了形状 (3,) 的数组

这是model.summary()的输出：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 552)          0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 5)            0                                            
__________________________________________________________________________________________________
sequential_1 (Sequential)       multiple             2304        input_1[0][0]                    
__________________________________________________________________________________________________
sequential_3 (Sequential)       (None, 5, 64)        2304        input_2[0][0]                    
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 552, 5)       0           sequential_1[1][0]               
                                                                 sequential_3[1][0]               
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 552, 5)       0           dot_1[0][0]                      
__________________________________________________________________________________________________
sequential_2 (Sequential)       multiple             180         input_1[0][0]                    
__________________________________________________________________________________________________
add_1 (Add)                     (None, 552, 5)       0           activation_1[0][0]               
                                                                 sequential_2[1][0]               
__________________________________________________________________________________________________
permute_1 (Permute)             (None, 5, 552)       0           add_1[0][0]                      
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 5, 616)       0           permute_1[0][0]                  
                                                                 sequential_3[1][0]               
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 32)           83072       concatenate_1[0][0]              
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 32)           0           lstm_1[0][0]                     
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 36)           1188        dropout_4[0][0]                  
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 36)           0           dense_1[0][0]                    
==================================================================================================
Total params: 89,048
Trainable params: 89,048
Non-trainable params: 0
__________________________________________________________________________________________________

我的理解是，模型是为确定单个单词的答案（即形状 (1,)）而构建的，我需要修改模型，因为现在我希望它能够确定多个单词的答案（在这种情况下，形状(3,))。我不明白的是如何改变模型结构来实现这一点。

我在模型摘要中没有看到任何地方表明形状 (1,) 的定义位置。我只看到以单词为单位的最大故事大小 (552)、以单词为单位的最大查询/问题大小 (5) 和以单词为单位的词汇量 (36) 的定义。

有人能帮我找出我做错了什么吗？

更新 #1

在我继续研究这个问题的过程中，我学到了更多东西。由于我不熟悉 ML 和 NN 的细节，所以我可能在所有这些点上都错了，所以如果有任何不妥之处，请随时与我联系。

形状的最后一个密集层 (None, 36) 的大小基于词汇量大小，随后的 softmax 激活层的目的是生成一个概率向量来指示哪个单词是正确的。如果是这种情况，那么通过将最后一个密集层减少到 (None, 3) 我会丢失信息吗？我会不会只得到一个包含三个概率的向量，而没有任何关于它们适用于什么词的指示？除非最后一个密集层是矢量化词汇表的索引？在那种情况下，我会知道被预测的单词，但是后续激活层的目的是什么？
sparse_categorical_crossentropy 损失函数将最终输出的形状减少到 ~/keras/engine/training.py on line 770 中的 (1,)。这是否意味着我使用了错误的损失函数？我不能使用categorical_crossentropy，因为我不想有一个单热矢量输出。这是否意味着我需要完全改变整个模型，或者另一个损失函数会给我想要的输出？

我想总而言之，是否可以对模型进行调整，还是我需要使用完全不同的模型？如果您也能根据上述两点澄清我的困惑，我将不胜感激。

【问题讨论】：

你知道你预测的额外单词应该是什么吗？ bAbI 旨在预测单个单词。
@nuric QA 任务 8 有一些答案是多个单词。
没错，我刚刚检查了数据集，你是对的。

标签： python tensorflow keras lstm

【解决方案1】：

一个简单的解决方案（不承诺性能）将只是添加另外两个具有自己权重的“答案”层，然后编译模型以输出这些。

answer = Dropout(0.3)(answer)

answer_1 = Dense(vocab_size, activation='softmax')(answer)
answer_2 = Dense(vocab_size, activation='softmax')(answer)
answer_3 = Dense(vocab_size, activation='softmax')(answer)

model = Model([input_sequence, question], [answer_1, answer_2, answer_3])

然后将您的标签作为三个(samples,1) 维数组的list 传递，您只需传递即可完成

first, second, third = answers_train.T

作为您的标签。这对于您的应用程序来说可能不够好，您可能需要查看其他 sequence to sequence 模型。

【讨论】：

谢谢，我会试试看会发生什么。虽然，如果模型结构不直接依赖于输出，就像我不需要 5 个输入层来解决 5 个单词的问题一样，那将是理想的。
这不适用于可变输出，尤其是在任务 8 上，如果收集的项目数量发生变化。
@modesitt 另外，我确实查看了您发布的链接，但我并不清楚如何为问答用例修改它。也许我会重新审视并深入了解它的实现。

【解决方案2】：

您将需要可变数量的输出，这需要在预测端使用循环网络。让我们尝试在现有网络上构建一个：

# first we'll add an extra input telling how many outputs we need
num_outs = Input(shape=(1,), dtype='int')
# ... continuing from
answer = LSTM(32)(answer)  # (samples, 32)
# answer is your encoded context-query, we will decode it into a sequence
answers = RepeatVector(num_outs[0])(answer) # (samples, num_outs, 32)
# Another RNN to give us decoding (optional)
answers = LSTM(32, return_sequences=True)(answers) # note return_sequences
answers = TimeDistributed(Dense(vocab_size, activation='softmax'))(answers)
# we have (samples, num_outs, vocab_size) so num_outs words
# ...

现在您的目标也必须是 3D 形状。重要，您必须将 end-of-answer 标记附加到每个答案，以便您知道何时在预测时间停止。同样，您可以填充批次中的答案数量，以获得答案结束标记后的张量。

现在在预测时，您可以要求 10 个单词并在结束答案标记后截取单词，类似于使用 seq2seq 模型完成机器翻译的方式。作为参考，请查看Dynamic Memory Networks。

【讨论】：

谢谢。我已经用一些新信息更新了这个问题。这个答案仍然适用吗？如果我需要一个 end-of-answer 标记，这是否意味着我会像blog.keras.io/… 那样为每个单词运行预测方法。或者，如果答案层具有正确数量的输出值，为什么要使用 end-of-answer 标记？
是的，您的输出现在将是 seq2seq 模型的解码器，因为您现在需要序列。该博客使用教师执法，您可以这样做或不这样做。您需要 end-of-answer 进行预测，否则您如何知道序列何时结束？它只会产生更多的单词。
我刚刚回到这个问题，但我的困惑是：如果它是一个 seq2seq 模型，那么为什么要为输出数量定义一个输入？我可以使用 EoA 令牌来获得我需要的输出数量。反之亦然，如果我有一个输入来定义输出的数量，那么模型是 seq2seq 吗？因为那时我在第一个预测中得到了所需数量的输出。我认为我错过了这一点。这有意义吗？
缺少的部分是你不能在 Keras 中动态迭代 RNN，直到你看到一个令牌，API 被设计成它根据输入的时间步长进行迭代，这可能与输出不同。因此，您可以实现自己的层或修改并手动给出迭代次数。
我想我明白了，但我暂时先不管它，我再玩这个。我必须对您的代码进行两个修改，我将 num_outs[0] 更改为 num_outs[0,0] 并将 TimeDistributed(Dense(vocab_size, activation='softmax'))(answers) 更改为 TimeDistributed(Dense(vocab_size, activation=' softmax'）（答案））。当我尝试训练它时，我得到：Error when checking target: expected lstm_9 to have shape (1,) but got array with shape (3,) 在第一个 LSTM 层。