【问题标题】:How to accumulate gradients in tensorflow?如何在张量流中累积梯度?
【发布时间】:2018-03-28 02:45:01
【问题描述】:

我有一个类似于this one 的问题。

因为我的资源有限,而且我使用的是深度模型 (VGG-16) - 用于训练三元组网络 - 我想为 128 批大小为一的训练示例累积梯度,然后传播错误并更新权重。

我不清楚我该怎么做。我使用 tensorflow,但欢迎任何实现/伪代码。

【问题讨论】:

  • 你为什么不使用你链接的问题的答案?
  • @Pop 因为我听不懂。我正在寻找更详细的东西(初级)

标签: tensorflow conv-neural-network gradient-descent


【解决方案1】:

让我们看一下您喜欢的答案之一中提出的代码:

## Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()

## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]

## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs = opt.compute_gradients(rmse, tvs)

## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]

## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

这第一部分基本上将新的variablesops 添加到您的图表中,这将允许您

  1. 在变量accum_vars (列表)中使用操作accum_ops 累积梯度
  2. 使用 ops train_step 更新模型权重

然后,要在训练时使用它,您必须按照以下步骤操作(仍然来自您链接的答案):

## The while loop for training
while ...:
    # Run the zero_ops to initialize it
    sess.run(zero_ops)
    # Accumulate the gradients 'n_minibatches' times in accum_vars using accum_ops
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    # Run the train_step ops to update the weights based on your accumulated gradients
    sess.run(train_step)

【讨论】:

  • 所以您将sess.run(train_step) 留在了循环之外。所以这意味着在计算最后一批的梯度之后会发生权重更新,对吗?如果我们把它放在循环中,它会在每个 epoch 之后发生,对吧?
【解决方案2】:

Tensorflow 2.0 Compatible Answer:根据上面Pop的回答和Tensorflow Website提供的解释,下面提到的是Tensorflow 2.0版本中累积梯度的代码:

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# call the above function    
train(epochs = 3)

完整的代码可以在这个Github Gist找到。

【讨论】:

  • 应该是optimizer.apply_gradients(zip(accum_ops, mnist_model.trainable_variables))
  • 我也无法在 tf.function 中创建 tf.Variable,有什么建议吗?
  • 我在遵循此代码时也遇到了麻烦,我发布了一个工作版本的链接问题; stackoverflow.com/a/62683800/8671242
猜你喜欢
  • 2023-03-27
  • 2019-04-19
  • 1970-01-01
  • 1970-01-01
  • 2016-08-20
  • 2021-01-04
  • 1970-01-01
  • 2020-09-15
  • 1970-01-01
相关资源
最近更新 更多