【发布时间】:2016-07-23 20:29:24
【问题描述】:
我正在尝试查看 Tensorflow 的多个 GPU 的初始代码(在 1 台机器上)。我很困惑,因为据我了解,我们从不同的塔(也就是 GPU)获得了多个损失,但评估的 loss 变量似乎只是最后一个塔的损失,而不是所有塔损失的总和:
for step in xrange(FLAGS.max_steps):
start_time = time.time()
_, loss_value = sess.run([train_op, loss])
duration = time.time() - start_time
loss 上次是专门为每个塔定义的:
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope:
# Force all Variables to reside on the CPU.
with slim.arg_scope([slim.variables.variable], device='/cpu:0'):
# Calculate the loss for one tower of the ImageNet model. This
# function constructs the entire ImageNet model but shares the
# variables across all towers.
loss = _tower_loss(images_splits[i], labels_splits[i], num_classes,
scope)
有人能解释一下合并不同塔的损失的步骤吗?或者我们只是一个塔的损失也代表其他塔的损失?
代码链接如下: https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L336
【问题讨论】:
标签: neural-network tensorflow conv-neural-network