当隐藏层超过 1 个时，神经网络不学习答案

【问题标题】：Neural Network not learning when has more than 1 hidden layers当隐藏层超过 1 个时，神经网络不学习
【发布时间】：2021-08-12 13:04:27
【问题描述】：

我正在实施我的第一个神经网络作为我的高中毕业论文。在使用 MNIST 数据集进行训练时，我得到了很好的结果。但那只是当我只使用 1 个隐藏层时，如果我在训练后尝试使用超过 1 个隐藏层总是给出相同的输出。我尝试用不止一层重新计算误差函数的导数，但我必须遗漏一些东西...... 这是我的反向传播方法的代码：

    public void BackPropagation(double[] error, bool batch)
    {
        double[][] temp = null;
        temp = NNMath.ArrayToMatrix(NNMath.EntryWiseProduct(error, NNMath.SigmoidDerivativeFromSigmoid(this.A[this.A.Length - 1])));
        this.DW[this.DW.Length - 1] = NNMath.TransposeMatrix(NNMath.DotProduct(NNMath.TransposeMatrix(temp), NNMath.ArrayToMatrix(this.A[this.DW.Length - 1])));
        temp[0].CopyTo(this.DB[this.DB.Length - 1], 0);

        for (int i = this.W.Length - 1; i > 0; i--)
        {
            temp = NNMath.DotProduct(temp, NNMath.TransposeMatrix(this.W[i]));
            temp = NNMath.EntryWiseProduct(temp, NNMath.ArrayToMatrix(NNMath.SigmoidDerivativeFromSigmoid(this.A[i])));
            if (batch)
            {
                this.DW[i - 1] = NNMath.EntryWiseSum(this.DW[i - 1], NNMath.DotProduct(NNMath.TransposeMatrix(this.A[i - 1]), temp));
                this.DB[i - 1] = NNMath.EntryWiseSum(this.DB[i - 1], temp[0]);
            }
            else
            {
                this.DW[i - 1] = NNMath.DotProduct(NNMath.TransposeMatrix(this.A[i - 1]), temp);
                temp[0].CopyTo(this.DB[i - 1], 0);
            }
        }
    }

我创建了一个名为 NNMath 的静态类，用于进行矩阵运算。

this.A 是一个二维数组，每一行代表一个激活层。
this.W 是一个 3 维数组，其中每个元素都是 2 层之间的权重矩阵。
this.DW 与 this.W 相同，但包含计算的导数
this.DB 是一个包含偏差导数的二维数组
batch 如果在批量训练期间调用该方法，则为 true

我使用 MSE 作为损失函数。

提前致谢！

编辑： 这是来自 NNMath 的更多代码

    public static double[] EntryWiseSum(double[] a, double[] b)
    {
        if (a.Length != b.Length)
            return null;
        double[] c = new double[a.Length];
        for (int i = 0; i < a.Length; i++)
                c[i] = a[i] + b[i];
        return c;
    }

    public static double SigmoidDerivativeFromSigmoid(double sigmoidA)
    {
        return sigmoidA * (1.0 - sigmoidA);
    }

    public static double[] SigmoidDerivativeFromSigmoid(double[] a)
    {
        double[] res = new double[a.Length];
        for (int i = 0; i < a.Length; i++)
            res[i] = SigmoidDerivativeFromSigmoid(a[i]);
        return res;
    }

【问题讨论】：

您的 hadamard 乘积方法 (NNMath.EntryWiseProduct(error) 是否返回数组而不是矩阵？
如果给定的 2 个输入是数组，则返回一个数组，如果两者都是矩阵，则返回一个矩阵
你的偏差在哪里，它们只是每一层矩阵中的另一行吗？
我有一个矩阵，行表示层和元素是节点的偏差，输入层没有偏差
我不确定您是如何实现点积的，但是对于乘积不是累积的，请确保您将它们相乘，以便它们向后扩展到层中。此外，您可能在这里乘以错误的矩阵NNMath.TransposeMatrix(this.W[i])，通常您会将梯度与前一层的转置输出进行点积，您可能打算使用this.DW

标签： c# neural-network backpropagation

【解决方案1】：

我发现了我的错误。如果有人想知道，这里是更正的方法。

    public void BackPropagationNew(double[] error, bool batch)
    {
        double[][] temp = null;
        temp = NNMath.ArrayToMatrix(NNMath.EntryWiseProduct(error, NNMath.SigmoidDerivativeFromSigmoid(this.A[this.A.Length - 1])));
        if (batch)
        {
            this.DW[this.DW.Length - 1] = NNMath.EntryWiseSum(this.DW[this.DW.Length - 1],  NNMath.TransposeMatrix(NNMath.DotProduct(NNMath.TransposeMatrix(temp), NNMath.ArrayToMatrix(this.A[this.DW.Length - 1]))));
            this.DB[this.DB.Length - 1] = NNMath.EntryWiseSum(this.DB[this.DB.Length - 1], temp[0]);
        }
        else
        {
            this.DW[this.DW.Length - 1] = NNMath.TransposeMatrix(NNMath.DotProduct(NNMath.TransposeMatrix(temp), NNMath.ArrayToMatrix(this.A[this.DW.Length - 1])));
            temp[0].CopyTo(this.DB[this.DB.Length - 1], 0);
        }

        for (int i = this.W.Length - 1; i > 0; i--)
        {
            temp = NNMath.DotProduct(temp, NNMath.TransposeMatrix(this.W[i]));
            temp = NNMath.EntryWiseProduct(temp, NNMath.ArrayToMatrix(NNMath.SigmoidDerivativeFromSigmoid(this.A[i])));
            if (batch)
            {
                this.DW[i - 1] = NNMath.EntryWiseSum(this.DW[i - 1], NNMath.DotProduct(NNMath.TransposeMatrix(this.A[i - 1]), temp));
                this.DB[i - 1] = NNMath.EntryWiseSum(this.DB[i - 1], temp[0]);
            }
            else
            {
                this.DW[i - 1] = NNMath.DotProduct(NNMath.TransposeMatrix(this.A[i - 1]), temp);
                temp[0].CopyTo(this.DB[i - 1], 0);
            }
        }
    }

因为我使用的是小批量训练，而我忘记检查它是否在第一次权重更改时分批进行，所以它实际上并没有改变权重。无论如何，感谢任何试图帮助我的人！下次我会尽量小心一点。

【讨论】：