使用 feed_dict 时在 Tensorflow（数据并行）中进行多 GPU 训练答案

【问题标题】：Multi GPU Training in Tensorflow (Data Parallelism) when Using feed_dict使用 feed_dict 时在 Tensorflow（数据并行）中进行多 GPU 训练
【发布时间】：2017-04-05 21:19:48
【问题描述】：

我想使用多个 GPU 来训练我的 Tensorflow 模型，以利用数据并行性。

我目前正在使用以下方法训练一个 Tensorflow 模型：

x_ = tf.placeholder(...)
y_ = tf.placeholder(...)
y = model(x_)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
optimizer = tf.train.AdamOptimizer()
train_op = tf.contrib.training.create_train_op(loss, optimizer)
for i in epochs:
   for b in data:
      _ = sess.run(train_op, feed_dict={x_: b.x, y_: b.y})

我想利用多个 GPU 以数据并行化的方式训练这个模型。即我想将我的批次分成两半，并在我的两个 GPU 之一上运行每个半批次。

cifar10_multi_gpu_train 似乎提供了一个很好的例子来创建从在多个 GPU 上运行的图形中提取的损失，但是当使用 feed_dict 和 placeholder 时，我还没有找到一个很好的例子来做这种训练风格与数据加载器队列相反。

更新

看起来像：https://timsainb.github.io/multi-gpu-vae-gan-in-tensorflow.html 可能是一个很好的例子。他们似乎从cifar10_multi_gpu_train.py 中提取average_gradients 并创建一个占位符，然后为每个GPU 切片。我认为您还需要将create_train_op 分为三个阶段：compute_gradients、average_gradients 和apply_gradients。

【问题讨论】：

我设法在 mnist 上重用了 cifar 示例。它很慢，结果并不令人兴奋。 github.com/normanheckscher/mnist-multi-gpu
您的案例与此页面上的示例非常相似：github.com/vahidk/EffectiveTensorflow。 make parallel 的第二个示例应该适合您。
@BoShao - 你能把你的评论移到答案吗？或者，亚历克斯，您是否设法通过找到的链接回答了您自己的问题，我们应该将其转化为答案？

标签： tensorflow

【解决方案1】：

我知道在多 GPU 模型上提供数据的三种方式。

如果所有输入的形状都相同，则可以在 CPU 上构建占位符 x，然后使用 tf.split 将 x 拆分为 xs。然后在 GPU 的每个塔上，输入xs[i]。

with tf.device("/cpu:0"):
    encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
    encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")

    # make sure batch % num_gpu == 0
    inputs = tf.split(encoder_inputs, axis=0)  # axis=0, split on batch dimension
    lens = tf.split(encoder_length, axis=0)

with tf.variable_scope(tf.get_variable_scope()):
    for i in range(num_gpus):
        with tf.device("/gpu:%d"%i):
            with tf.name_scope("tower_%d"%i):
                loss = compute_loss(inputs[i], lens[i])

如果您的输入具有不同的形状，您需要在每个具有范围的 GPU 上构建占位符 x。


def init_placeholder(self):
    with tf.variable_scope("inputs"):   # use a scope
        encoder_inputs = tf.placeholder(tf.int32, [None, None], name="encoder_inputs")
        encoder_length = tf.placeholder(tf.int32, [None,], name="encoder_length")
    return encoder_inputs, encoder_length

with tf.variable_scope(tf.get_variable_scope()):
    for g, gpu in enumerate(GPUS):
        with tf.device("/gpu:%d"%gpu):
            with tf.name_scope("tower_%d"%g):
                x, x_len = model.init_placeholder()  # these placeholder Tensor are on GPU
                loss = model.compute_loss(x, x_len)

使用tf.data.Dataset 提供数据。 google官方cifar10_multi_gpu_train.py使用Queue，与此类似。

【讨论】：