梯度下降：增量值应该是标量还是向量？答案

【问题标题】：Gradient descent : should delta value be scalar or vector?梯度下降：增量值应该是标量还是向量？
【发布时间】：2016-09-08 17:30:33
【问题描述】：

在运行反向传播后计算神经网络的 delta 值时：

delta(1) 的值将是一个标量值，它应该是一个向量？

更新：

取自http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

具体来说：

【问题讨论】：

公式有参考吗？
@greeness 请看更新

标签： machine-learning neural-network gradient-descent

【解决方案1】：

首先，您可能明白，在每一层中，我们都有n x m 需要学习的参数（或权重），因此它形成了一个二维矩阵。

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

我们有n x m 参数，因为在前一层和当前层之间的两个节点中的任何一个节点之间都有一个连接。

我很确定 L 层的 Delta（大 Delta）用于为 L 层的每个参数累积偏导数项。所以你在每一层都有一个 Delta 的二维矩阵。更新矩阵的第i行（当前层第i个节点）和第j列（上一层第j个节点），

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

因此，要回答您的问题，Delta 应该是一个矩阵。

【讨论】：

谢谢，但我的问题是为什么输出标量而不是矩阵，因为错误 * (a)transpose 是 scala。也许我指向的链接不正确？
错误为 nx1，a 的转置为 1xm，因此乘积为 nxm。你可能用 (1xn) x (nx1) 计算，所以它变成了一个标量。