如何在仍然能够训练的同时转换 NN 的输出？答案

【问题标题】：How to transform output of NN, while still being able to train?如何在仍然能够训练的同时转换 NN 的输出？
【发布时间】：2022-01-22 08:10:02
【问题描述】：

我有一个输出output 的神经网络。我想在损失和反向传播发生之前转换output。

这是我的通用代码：

with torch.set_grad_enabled(training):
                  outputs = net(x_batch[:, 0], x_batch[:, 1]) # the prediction of the NN
                  # My issue is here:
                  outputs = transform_torch(outputs)
                  loss = my_loss(outputs, y_batch)

                  if training:
                      scheduler.step()
                      loss.backward()
                      optimizer.step()

按照How to transform output of neural network and still train? 中的建议，我有一个转换函数，我将输出通过：

def transform_torch(predictions):
    new_tensor = []
    for i in range(int(len(predictions))):
      arr = predictions[i]
      a = arr.clone().detach() 
      
      # My transformation, which results in a positive first element, and the other elements represent decrements of the first positive element.
     
      b = torch.negative(a)
      b[0] = abs(b[0])
      new_tensor.append(torch.cumsum(b, dim = 0))

      # new_tensor[i].requires_grad = True
    new_tensor = torch.stack(new_tensor, 0)    

    return new_tensor

注意：除了clone().detach()，我还尝试了Pytorch preferred way to copy a tensor中描述的方法，结果相似。

我的问题是，这个被转换的张量实际上并没有进行任何训练。

如果我尝试就地修改张量（例如直接修改arr），Torch 会抱怨我无法就地修改带有渐变的张量。

有什么建议吗？

【问题讨论】：

绝对不要调用detach，这会清除渐变。 b 有什么意义？没用过。
我修复了b 变量。我目前调用detach 的原因是，如果我不调用 detach，我无法在没有 Pytorch 抱怨的情况下编辑张量。

标签： python deep-learning neural-network pytorch backpropagation

【解决方案1】：

在您的 predictions 上调用 detach 会停止向您的模型传播梯度。之后您所做的任何事情都不会改变您的参数。

如何修改您的代码以避免这种情况：

def transform_torch(predictions):
  b = torch.cat([predictions[:, :1, ...].abs(), -predictions[:, 1:, ...]], dim=1)
  new_tensor = torch.cumsum(b, dim=1)
  return new_tensor

您可以运行一个小测试，以验证渐变是否通过此转换传播：

# start with some random tensor representing the input predictions
# make sure it requires_grad
pred = torch.rand((4, 5, 2, 3)).requires_grad_(True)
# transform it
tpred = transform_torch(pred)

# make up some "default" loss function and back-prop
tpred.mean().backward()

# check to see all gradients of the original prediction:
pred.grad
# as you can see, all gradients are non-zero
Out[]:
tensor([[[[ 0.0417,  0.0417,  0.0417],
          [ 0.0417,  0.0417,  0.0417]],

         [[-0.0333, -0.0333, -0.0333],
          [-0.0333, -0.0333, -0.0333]],

         [[-0.0250, -0.0250, -0.0250],
          [-0.0250, -0.0250, -0.0250]],

         [[-0.0167, -0.0167, -0.0167],
          [-0.0167, -0.0167, -0.0167]],

         [[-0.0083, -0.0083, -0.0083],
          [-0.0083, -0.0083, -0.0083]]],


        [[[ 0.0417,  0.0417,  0.0417],
          [ 0.0417,  0.0417,  0.0417]],

         [[-0.0333, -0.0333, -0.0333],
          [-0.0333, -0.0333, -0.0333]],

         [[-0.0250, -0.0250, -0.0250],
          [-0.0250, -0.0250, -0.0250]],

         [[-0.0167, -0.0167, -0.0167],
          [-0.0167, -0.0167, -0.0167]],

         [[-0.0083, -0.0083, -0.0083],
          [-0.0083, -0.0083, -0.0083]]],


        [[[ 0.0417,  0.0417,  0.0417],
          [ 0.0417,  0.0417,  0.0417]],

         [[-0.0333, -0.0333, -0.0333],
          [-0.0333, -0.0333, -0.0333]],

         [[-0.0250, -0.0250, -0.0250],
          [-0.0250, -0.0250, -0.0250]],

         [[-0.0167, -0.0167, -0.0167],
          [-0.0167, -0.0167, -0.0167]],

         [[-0.0083, -0.0083, -0.0083],
          [-0.0083, -0.0083, -0.0083]]],


        [[[ 0.0417,  0.0417,  0.0417],
          [ 0.0417,  0.0417,  0.0417]],

         [[-0.0333, -0.0333, -0.0333],
          [-0.0333, -0.0333, -0.0333]],

         [[-0.0250, -0.0250, -0.0250],
          [-0.0250, -0.0250, -0.0250]],

         [[-0.0167, -0.0167, -0.0167],
          [-0.0167, -0.0167, -0.0167]],

         [[-0.0083, -0.0083, -0.0083],
          [-0.0083, -0.0083, -0.0083]]]])

如果你用你的原始代码尝试这个小测试，你会得到一个错误，你试图通过不require_grad 的张量进行传播，或者你不会得到输入pred 的 grads .

【讨论】：

感谢您的回复，谢。但是，我注意到只有张量中的第一个值会被训练。对此有何建议？
@GalenBlueTalon 为什么只有第一个？首先，频道？
谢谢你，Shai！

【解决方案2】：

用这样的方法从张量中提取 grad 怎么样

  grad = output.grad

在变换后将相同的梯度分配给新的张量

【讨论】：

您实际上是在提议显式实现transform_torch的backward函数。这并不总是微不足道的，而且很容易出错。但是您可以这样做，请参阅this tutorial 了解如何使用其派生函数实现自定义函数。