【发布时间】:2021-03-20 22:18:33
【问题描述】:
'梯度不会被更新而是被累积,并且每 N 轮更新一次。'我有一个问题,梯度是如何在下面的代码 sn-p 中累积的:在下面循环的每一轮中,我都可以看到一个新的梯度是由 loss.backward() 计算的,应该在内部存储,但是这个内部存储的梯度会在下一轮刷新吗?梯度如何求和,然后每 N 轮应用一次?
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad()
【问题讨论】: