【发布时间】:2018-11-10 19:27:15
【问题描述】:
我正在尝试将Keras's memory neural net using the bAbI dataset 从输出单个单词修改为输出多个单词(本例中为 3 个)。对于上下文,这是一个使用 LSTM 进行问答的 NLP 模型。
这是模型结构的一个sn-p:
# placeholders
input_sequence = Input((story_maxlen,))
question = Input((query_maxlen,))
# encoders
# embed the input sequence into a sequence of vectors
input_encoder_m = Sequential()
input_encoder_m.add(Embedding(input_dim=vocab_size,
output_dim=64))
input_encoder_m.add(Dropout(0.3))
# output: (samples, story_maxlen, embedding_dim)
# embed the input into a sequence of vectors of size query_maxlen
input_encoder_c = Sequential()
input_encoder_c.add(Embedding(input_dim=vocab_size,
output_dim=query_maxlen))
input_encoder_c.add(Dropout(0.3))
# output: (samples, story_maxlen, query_maxlen)
# embed the question into a sequence of vectors
question_encoder = Sequential()
question_encoder.add(Embedding(input_dim=vocab_size,
output_dim=64,
input_length=query_maxlen))
question_encoder.add(Dropout(0.3))
# output: (samples, query_maxlen, embedding_dim)
# encode input sequence and questions (which are indices)
# to sequences of dense vectors
input_encoded_m = input_encoder_m(input_sequence)
input_encoded_c = input_encoder_c(input_sequence)
question_encoded = question_encoder(question)
# compute a 'match' between the first input vector sequence
# and the question vector sequence
# shape: `(samples, story_maxlen, query_maxlen)`
match = dot([input_encoded_m, question_encoded], axes=(2, 2))
match = Activation('softmax')(match)
# add the match matrix with the second input vector sequence
response = add([match, input_encoded_c]) # (samples, story_maxlen, query_maxlen)
response = Permute((2, 1))(response) # (samples, query_maxlen, story_maxlen)
# concatenate the match matrix with the question vector sequence
answer = concatenate([response, question_encoded])
# the original paper uses a matrix multiplication for this reduction step.
# we choose to use a RNN instead.
answer = LSTM(32)(answer) # (samples, 32)
# one regularization layer -- more would probably be needed.
answer = Dropout(0.3)(answer)
answer = Dense(vocab_size)(answer) # (samples, vocab_size)
# we output a probability distribution over the vocabulary
answer = Activation('softmax')(answer)
这就是它的编译和训练方式:
model = Model([input_sequence, question], answer)
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit([inputs_train, queries_train], answers_train,
batch_size=32,
epochs=num_epochs,
validation_data=([inputs_test, queries_test], answers_test))
在上面的示例中,answers_train 变量是一个 1xn 矩阵,其中每个项目是一个问题的值。因此,例如,前三个答案:
print(answers_train[:3])
输出:
[16 16 19]
我的问题
这是我对 answer_train 变量所做的更改:
print(answers_train[:3])
输出:
[[ 0 0 16]
[ 0 0 27]
[ 0 0 16]]
基本上,我试图预测最多三个单词而不是一个。
当我这样做并尝试训练模型时,我得到了这个错误:
ValueError:检查目标时出错:预期activation_29 有 形状 (1,) 但得到了形状 (3,) 的数组
这是model.summary()的输出:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 552) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 5) 0
__________________________________________________________________________________________________
sequential_1 (Sequential) multiple 2304 input_1[0][0]
__________________________________________________________________________________________________
sequential_3 (Sequential) (None, 5, 64) 2304 input_2[0][0]
__________________________________________________________________________________________________
dot_1 (Dot) (None, 552, 5) 0 sequential_1[1][0]
sequential_3[1][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 552, 5) 0 dot_1[0][0]
__________________________________________________________________________________________________
sequential_2 (Sequential) multiple 180 input_1[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 552, 5) 0 activation_1[0][0]
sequential_2[1][0]
__________________________________________________________________________________________________
permute_1 (Permute) (None, 5, 552) 0 add_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 5, 616) 0 permute_1[0][0]
sequential_3[1][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 32) 83072 concatenate_1[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) (None, 32) 0 lstm_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 36) 1188 dropout_4[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 36) 0 dense_1[0][0]
==================================================================================================
Total params: 89,048
Trainable params: 89,048
Non-trainable params: 0
__________________________________________________________________________________________________
我的理解是,模型是为确定单个单词的答案(即形状 (1,))而构建的,我需要修改模型,因为现在我希望它能够确定多个单词的答案(在这种情况下,形状(3,))。我不明白的是如何改变模型结构来实现这一点。
我在模型摘要中没有看到任何地方表明形状 (1,) 的定义位置。我只看到以单词为单位的最大故事大小 (552)、以单词为单位的最大查询/问题大小 (5) 和以单词为单位的词汇量 (36) 的定义。
有人能帮我找出我做错了什么吗?
更新 #1
在我继续研究这个问题的过程中,我学到了更多东西。由于我不熟悉 ML 和 NN 的细节,所以我可能在所有这些点上都错了,所以如果有任何不妥之处,请随时与我联系。
- 形状的最后一个密集层 (None, 36) 的大小基于词汇量大小,随后的 softmax 激活层的目的是生成一个概率向量来指示哪个单词是正确的。如果是这种情况,那么通过将最后一个密集层减少到 (None, 3) 我会丢失信息吗?我会不会只得到一个包含三个概率的向量,而没有任何关于它们适用于什么词的指示?除非最后一个密集层是矢量化词汇表的索引?在那种情况下,我会知道被预测的单词,但是后续激活层的目的是什么?
-
sparse_categorical_crossentropy损失函数将最终输出的形状减少到 ~/keras/engine/training.py on line 770 中的 (1,)。这是否意味着我使用了错误的损失函数?我不能使用categorical_crossentropy,因为我不想有一个单热矢量输出。这是否意味着我需要完全改变整个模型,或者另一个损失函数会给我想要的输出?
我想总而言之,是否可以对模型进行调整,还是我需要使用完全不同的模型?如果您也能根据上述两点澄清我的困惑,我将不胜感激。
【问题讨论】:
-
你知道你预测的额外单词应该是什么吗? bAbI 旨在预测单个单词。
-
@nuric QA 任务 8 有一些答案是多个单词。
-
没错,我刚刚检查了数据集,你是对的。
标签: python tensorflow keras lstm