【发布时间】:2021-08-01 17:30:35
【问题描述】:
我有一个自定义训练循环,我正在尝试分析它以调试一些内存问题。我还使用文件编写器来捕获训练信息,然后在 Tensorboard 中显示:
@tf.function
def train_step(self, optimizer, data, train_loss_object):
with tf.GradientTape() as tape:
audio_data, transcrs, audio_lengths, label_lengths = data
logits = self.model(audio_data, training=True)
ctc_loss = self.loss_function(logits, transcrs, audio_lengths, label_lengths)
grads = tape.gradient(ctc_loss, self.model.trainable_variables)
optimizer.apply_gradients(zip(grads, self.model.trainable_variables))
train_loss_object(ctc_loss)
def train_full(self, train_dataset, num_epochs = 5, batch_size = 8):
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_log_dir = 'logs/' + current_time + '/train'
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
optimizer = tf.optimizers.Adam()
train_loss_object = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)
start = time.time()
with tf.profiler.experimental.Profile('logs/'):
for epoch in range(num_epochs):
for data in train_dataset:
self.train_step(optimizer, data, train_loss_object)
with train_summary_writer.as_default():
tf.summary.scalar('ctc_loss', train_loss_object.result(), step=epoch)
stop = time.time()
print(f'Epoch {epoch}, Train Loss: {train_loss_object.result()}, Time elapsed: {stop-start} seconds')
train_loss_object.reset_states()
这给了我一个名为 logs 的文件夹,其中包含两个文件夹,一个文件夹包含训练损失,另一个文件夹 plugins/profiler 包含 .pb 分析文件。
当我使用 tensorboard --logdir logs 将 TensorBoard 指向 logs 文件夹时,它会很好地显示标量,但是当我导航到下拉菜单并单击 PROFILE 时,什么都不会显示。我错过了什么?
【问题讨论】:
-
您使用的是哪种浏览器?火狐?如果是这样,那么使用 chrome github.com/tensorflow/tensorboard/issues/2874
标签: python tensorflow tensorboard profiler