【问题标题】:Can I sum the gradients in each training iteration?我可以对每次训练迭代中的梯度求和吗?
【发布时间】:2019-08-07 17:53:57
【问题描述】:

我需要对每次迭代中的梯度求和,然后将这些梯度转移到另一个进程以重现学习到的网络。

关键代码如下所示。方法一:

class Net(nn.Module):
    def __init__(self, state_dim, action_dim, max_action):
        super(Actor, self).__init__()

        self.l1 = nn.Linear(state_dim, 40)
        self.l2 = nn.Linear(40, 30)
        self.l3 = nn.Linear(30, action_dim)

        self.max_action = max_action

    def forward(self, x):
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = self.max_action * torch.tanh(self.l3(x))
        return x

def train(batches,state_dim, action_dim, max_action):
    actor = Net(state_dim, action_dim, max_action)
    critic = Net(state_dim, action_dim, max_action)

    for i in range(1000):
        ...
        #Compute critic loss
        critic_loss = F.mse_loss(current_Q, target_Q)

        # Optimize the critic
        critic_optimizer.zero_grad()
        critic_loss.backward()
        critic_optimizer.step()

        # Compute actor loss
        actor_loss = -critic(state,actor(state)).mean()

        # Optimize the actor
        actor_optimizer.zero_grad()
        actor_loss.backward()
        actor_optimizer.step()



    return net

...
net = train(batches,state_dim, action_dim, max_action)

方法二:

...

def train(batches,state_dim, action_dim, max_action):
    net = Net(state_dim, action_dim, max_action)
    for i in range(1000):
        ...

        # Optimize the critic
        critic_optimizer.zero_grad()
        critic_loss.backward()
        sum_grads(critic) # sum the gradient in critic

    for g,p in zip(sum_grads,net.parameters()):
        p.grad = torch.from_numpy(g)
    net_optimizer.step()

    return net

...
net = train(batches,state_dim, action_dim, max_action)

我希望方法一和方法二可以学习相同的网络参数,但它没有。所以我的问题是为什么?以及如何让它发挥作用?提前谢谢你。

【问题讨论】:

  • 请发布代码或算法,以帮助我们帮助您。
  • 嘿,我已经发布了一些代码。谢谢。

标签: neural-network gradient pytorch reinforcement-learning backpropagation


【解决方案1】:

不需要显式求和,你只需要 zero_grad 一次,向前和向后应用,但没有循环中的步骤并调用一次步骤

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()                           # Reset gradients tensors

【讨论】:

  • 谢谢。但是,在每次迭代中,我都必须更新网络参数(actor-critic 架构,使用更新后的critic 来帮助更新actor参数)。那么如何处理这种情况呢?
  • 我不明白。如果你想为一些小批量积累你的毕业生并将它们一起应用(因为你会因为大批量而出现内存不足的问题),你可以使用我的代码。这正是您在第二种方法中尝试做的事情。否则,您的第一种方法是正确的,并且有效。
  • 谢谢。我已经编辑了示例代码。在训练过程中,首先更新critic网络,然后使用该网络训练actor-network。所以似乎有必要在每次迭代中更新参数。
猜你喜欢
  • 1970-01-01
  • 2018-06-01
  • 1970-01-01
  • 2019-06-05
  • 2020-12-03
  • 2013-06-21
  • 2017-12-11
  • 1970-01-01
  • 2019-02-01
相关资源
最近更新 更多