无法理解自定义 Theano Op 中的 grad() 方法

【问题标题】：can't understand the grad() method in custom Theano Op无法理解自定义 Theano Op 中的 grad() 方法
【发布时间】：2017-05-05 06:37:53
【问题描述】：

当我阅读有关创建新操作的文档时，我无法理解示例http://deeplearning.net/software/theano/extending/extending_theano.html#example-op-definition 中的 grad()。为什么他们返回output_grads[0] * 2 而不是2？ output_grads[0] 代表什么？

如果 output_grads[0] 表示关于输入 x 的链导数，在下一个示例中 http://deeplearning.net/software/theano/extending/extending_theano.html#example-props-definition，为什么 grad() 返回 a * output_grads[0] + b（应该是 self.a * output_grads[0] + self.b）而不是 self.a * output_grads[0]？

更复杂的自定义操作怎么样？像y = exp(x1)/(a*(x1**3)+log(x2))，它的grad()怎么写？另外，如果输入是向量或矩阵，grad()怎么写？

【问题讨论】：

【解决方案1】：

正如the extended .grad() documentation 指出的那样，output_grads 参数是

（其中 f 是您的 Op 的输出之一，C 是您调用 theano.tensor.grad(...) 的成本）

页面还说Op 的.grad(...) 方法必须返回

（其中 x 是您的 Op 的输入）

我认为ax+b 的例子是错误的。如果您查看实际代码，例如Sigmoid 或XLogX，它似乎只是实现了链式规则。

免责声明：到目前为止，我还没有实现自定义 Op，我自己也在研究这个问题，这就是我的理解。

【讨论】：

另见：deeplearning.net/software/theano/sandbox/…
谢谢！ LDGN。我想你是对的。 Output_grads[0] 表示 dC/df，其中 f 是 Op 的输出，即 y1。如果没有下一级 Op 的输入包含 y1，则将 output_grads [0] 设置为 1。