如何使用就地操作打破 PyTorch autograd答案

【问题标题】：How to break PyTorch autograd with in-place ops如何使用就地操作打破 PyTorch autograd
【发布时间】：2020-07-11 22:49:46
【问题描述】：

我试图更好地理解就地操作在 PyTorch autograd 中的作用。我的理解是它们可能会导致问题，因为它们可能会覆盖后退步骤中所需的值。

我正在尝试构建一个就地操作破坏自动微分的示例，我的想法是在用于计算其他张量之后覆盖反向传播期间所需的一些值。

我使用分配作为就地操作（我尝试+= 得到相同的结果），我以这种方式仔细检查了它是一个就地操作：

x = torch.arange(5, dtype=torch.float, requires_grad=True)
y = x
y[3] = -1
print(x)

打印：

tensor([ 0.,  1.,  2., -1.,  4.], grad_fn=<CopySlices>)

这是我尝试打破 autograd：

没有就地操作：

x = torch.arange(5, dtype=torch.float, requires_grad=True)
out1 = x ** 2
out2 = out1 / 10
# out1[3] += 100  
out2.sum().backward()
print(x.grad)

打印出来

tensor([0.0000, 0.2000, 0.4000, 0.6000, 0.8000])

使用就地操作：

x = torch.arange(5, dtype=torch.float, requires_grad=True)
out1 = x ** 2
out2 = out1 / 10
out1[3] = 0  
out2.sum().backward()
print(x.grad)

打印出来：

tensor([0.0000, 0.2000, 0.4000, 0.6000, 0.8000])

我期待获得不同的毕业生。

项目分配在做什么？我没有收到grad_fn=<CopySlices>。
为什么返回相同的毕业生？
是否存在破坏 autograd 的就地操作的工作示例？
是否有不向后兼容的 PyTorch 操作列表？

【问题讨论】：

标签： python pytorch autograd

【解决方案1】：

一个破坏 autograd 的就地操作的工作示例：

  x = torch.ones(5, requires_grad=True)
  x2 = (x + 1).sqrt()
  z = (x2 - 10)
  x2[0] = -1
  z.sum().backward()

加注：

RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.FloatTensor [5]], which is output 0 of SqrtBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

【讨论】：