【发布时间】:2021-03-09 17:42:22
【问题描述】:
我在这里单步执行代码:https://www.tensorflow.org/tutorials/text/nmt_with_attention 作为一种学习方法,我对何时调用损失函数以及传递了什么感到困惑。我在 loss_function 中添加了两个打印语句,当训练循环运行时,它只打印出来
(64,) (64, 4935)
一开始就多次,然后就没有了。我在两个方面感到困惑:
- 为什么 loss_function() 没有通过训练循环重复调用并打印形状?我预计损失函数会在每个大小为 64 的批次结束时被调用。
- 我预计实际的形状是(批量大小、时间步长)和预测的形状是(批量大小、时间步长、词汇量大小)。看起来损失在每个时间步都被单独调用(64 是批量大小,4935 是词汇量)。
我认为相关的部分转载如下。
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
print(real.shape)
print(pred.shape)
loss_ = loss_object(rea
l, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask #set padding entries to zero loss
return tf.reduce_mean(loss_)
@tf.function
def train_step(inp, targ, enc_hidden):
loss = 0
with tf.GradientTape() as tape:
enc_output, enc_hidden = encoder(inp, enc_hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
# Teacher forcing - feeding the target as the next input
for t in range(1, targ.shape[1]):
# passing enc_output to the decoder
predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
print(targ[:, t])
print(predictions)
loss += loss_function(targ[:, t], predictions)
# using teacher forcing
dec_input = tf.expand_dims(targ[:, t], 1)
batch_loss = (loss / int(targ.shape[1]))
variables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return batch_loss
EPOCHS = 10
for epoch in range(EPOCHS):
start = time.time()
enc_hidden = encoder.initialize_hidden_state()
total_loss = 0
for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
#print(batch)
batch_loss = train_step(inp, targ, enc_hidden)
total_loss += batch_loss
if batch % 100 == 0:
print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
batch,
batch_loss.numpy()))
# saving (checkpoint) the model every 2 epochs
if (epoch + 1) % 2 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print('Epoch {} Loss {:.4f}'.format(epoch + 1,
total_loss / steps_per_epoch))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
【问题讨论】:
标签: tensorflow keras tensorflow2.0