在 pytorch 中使用参数偏导数进行训练答案

【问题标题】：Training with parametric partial derivatives in pytorch在 pytorch 中使用参数偏导数进行训练
【发布时间】：2021-02-23 21:55:37
【问题描述】：

给定一个权重为 theta 和输入 x 的神经网络，我有兴趣计算神经网络输出 w.r.t 的偏导数。 x，这样我就可以在使用取决于输出和输出的偏导数的损失来训练权重 theta 时使用结果。我想出了如何计算this post 之后的偏导数。我还发现 this post 解释了如何使用 sympy 来实现类似的功能，但是，在 pytorch 中使其适应神经网络上下文似乎需要大量的工作，而且代码非常慢。

因此，我尝试了一些不同的方法，但失败了。作为一个最小的例子，我创建了一个函数（替换我的神经网络）

theta = torch.ones([3], requires_grad=True, dtype=torch.float32)
def trainable_function(time):
    return theta[0]*time**3 + theta[1]*time**2 + theta[2]*time

然后，我定义了第二个函数来给我偏导数：

def trainable_derivative(time):
    deriv_time = torch.tensor(time, requires_grad=True)
    fun_value = trainable_function(deriv_time)
    gradient = torch.autograd.grad(fun_value, deriv_time, create_graph=True, retain_graph=True)
    deriv_time.requires_grad = False
    return gradient

鉴于对导数的一些嘈杂观察，我现在尝试训练 theta。为简单起见，我创建了仅取决于导数的损失。在这个最小的例子中，导数被直接用作观察，而不是正则化，以避免复杂的损失函数。

def objective(train_times, observations):
    predictions = torch.squeeze(torch.tensor([trainable_derivative(a) for a in train_times]))
    return torch.sum((predictions - observations)**2)

optimizer = Adam([theta], lr=0.1)
for iteration in range(200):
    optimizer.zero_grad()
    loss = objective(data_times, noisy_targets)
    loss.backward()
    optimizer.step()

不幸的是，运行这段代码时，我得到了错误

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

我想当以我的方式计算偏导数时，我并没有真正创建一个计算图，通过它 autodiff 可以区分。因此，与参数 theta 的连接不知何故丢失了，现在优化器看起来好像丢失完全独立于参数 theta。但是，我可能完全错了..

有谁知道如何解决这个问题？是否可以在 pytorch 的损失函数中包含这种类型的导数？如果是这样，那么最 pytorch 风格的做法是什么？

非常感谢您的帮助和建议，非常感谢。

为了完整性：

要运行上述代码，需要生成一些训练数据。我使用了以下代码，它运行良好，并且已经针对分析导数进行了测试：

true_a = 1
true_b = 1
true_c = 1


def true_function(time):
    return true_a*time**3 + true_b*time**2 + true_c*time


def true_derivative(time):
    deriv_time = torch.tensor(time, requires_grad=True)
    fun_value = true_function(deriv_time)
    return torch.autograd.grad(fun_value, deriv_time)

data_times = torch.linspace(0, 1, 500)
true_targets = torch.squeeze(torch.tensor([true_derivative(a) for a in data_times]))
noisy_targets = torch.tensor(true_targets) + torch.randn_like(true_targets)*0.1

【问题讨论】：

标签： python pytorch

【解决方案1】：

您解决问题的方法似乎过于复杂。我相信您想要实现的目标在 PyTorch 中触手可及。我在这里包含一个简单的代码 sn-p，我相信它可以展示您想要做什么：

import torch
import torch.nn as nn

# Data and Function
torch.manual_seed(0) 
input_dim  = 1
output_dim = 2
n = 10 # batchsize
simple_function = nn.Sequential(nn.Linear(1, 2), nn.Sigmoid())
t = (torch.arange(n).float() / n).view(n, 1) 
x = torch.randn(n, output_dim)
t.requires_grad = True

# Actual computation
xhat = simple_function(t)
jac = torch.autograd.functional.jacobian(simple_function, t, create_graph=True)
grad = jac[torch.arange(n),:,torch.arange(n),0]
loss = (x -xhat).pow(2).sum() + grad.pow(2).sum()
loss.backward()

【讨论】：

感谢您的反馈。有趣的是，问题在于行 predictions = torch.squeeze(torch.tensor([trainable_derivative(a) for a in train_times])) pytorch 似乎对列表构造有问题。如果预测是使用 torch.zeros 创建的，然后使用 for 循环按元素填充，那么一切正常。不过你是对的，这种方法过于复杂，在我们的代码框架的更大结构中需要看起来像这样。