在 python 中实现神经网络答案

【问题标题】：implement Neural Network in python在 python 中实现神经网络
【发布时间】：2018-10-28 01:23:52
【问题描述】：

我正在尝试使用 numpy 在 python 中实现神经网络 (NN)，但我发现我的 NN 无法按预期工作。

我检查了数值梯度并将其与反向传播计算的梯度进行了比较。事实证明我是对的。但成本下降非常缓慢，经过几个时期后会反弹。

我正在尝试解决异或的问题。但是我的神经网络似乎忽略了每个样本的输入向量，并倾向于将所有样本预测为标签为 1 的样本百分比（例如，如果我给它提供 3 个正样本和 1 个负样本，它将预测所有四个样本大约 0.75)。

谁能帮我解决这个问题？这已经困扰我很久了。

这是神经网络的结构和一些公式

structure of NN

formula

这是我的代码

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(565113221)

def sigmoid(x): # sigmoid function 
    return 1/(1+np.power(np.e,-x))

def forward(x,W1,W2,b1,b2): # feed forward
    a = W1.dot(x)
    z = sigmoid(a+b1)
    b = W2.dot(z)
    y = sigmoid(b+b2)
    return a,z,b,y

def pred(X,W1,W2,b1,b2): # predict
    y_pred = np.zeros((X.shape[0],1))
    for i in range(X.shape[0]):
        _,_,_,y_pred[i] = forward(x.reshape((-1,1)),W1,W2,b1,b2)
    return y_pred

X = np.array([[0,0],[0,1],[1,0],[1,1]]) # features 4 * 2
Y = np.array([[0],[1],[1],[0]]) # labels 4 * 1

epsilon = 0.12 # initialize all weighs between -0.12 ~ 0.12
W1 = np.random.random((2,2)) * epsilon * 2 - epsilon # map from input to hidden
b1 = np.random.random((2,1)) * epsilon * 2 - epsilon # bias1 
W2 = np.random.random((1,2)) * epsilon * 2 - epsilon # map from hidden to output
b2 = np.random.random((1,1)) * epsilon * 2 - epsilon # bias2
epoch = 50 # maximum training turns
alpha = 0.01 # learning rate
for turn in range(epoch):
    print('turn:',turn,end=' ')
    epoch_cost = 0
    for index in range(X.shape[0]):
        x = X[index,:].reshape((-1,1))
        y = Y[index,:].reshape((-1,1))
        a,z,b,y_pred = forward(x,W1,W2,b1,b2) # feed forward

        cost = -y.dot(np.log(y_pred)) - (1-y).dot(np.log(1-y_pred)) # calculate cost
        epoch_cost += cost # calculate cumulative cost of this epoch

        for k in range(W2.shape[0]): # update W2
            for j in range(W2.shape[1]):
                W2[k,j] -= alpha * (y_pred - y) * z[j,0]

        for k in range(b2.shape[0]): # update b2
            b2[k,0] -= alpha * (y_pred - y)


        for j in range(W1.shape[0]): # update W1
            for i in range(W1.shape[1]):
                for k in range(W2.shape[0]):
                    W1[j,i] -= alpha * (y_pred - y) * W2[k,j] * z[j,0] * (1 - z[j,0]) * x[i]

        for j in range(b1.shape[0]): # update b1
            b1[j,0] -= alpha * (y_pred - y) * W2[k,j] * z[j,0] * (1 - z[j,0])

    print('cost:',epoch_cost)


print('prediction\n',pred(X,W1,W2,b1,b2))
print('ground-truth\n',Y)

【问题讨论】：

您对 W1 和 b1 的梯度更新应该使用 W2 的原始值，而不是新值。
pred() 中的错误：它应该在内部循环中初始化 x
谢谢，我应该同时更新参数，pred()有bug

标签： python numpy neural-network

【解决方案1】：

不是一个完整的答案。我刚刚用类似于简单的evolution strategy (ES) 的东西替换了梯度下降。这行得通，因此您的前向传递中可能没有错误。

# [...] sigmoid(), forward(), pred() not modified

X = np.array([[0,0],[0,1],[1,0],[1,1]]) # features 4 * 2
Y = np.array([[0],[1],[1],[0]]) # labels 4 * 1

W1 = np.zeros((2,2)) # map from input to hidden
b1 = np.zeros((2,1)) # bias1 
W2 = np.zeros((1,2)) # map from hidden to output
b2 = np.zeros((1,1)) # bias2
epoch = 2000 # maximum training turns
for turn in range(epoch):
    print('turn:',turn,end=' ')
    epoch_cost = 0
    for index in range(X.shape[0]):
        x = X[index,:].reshape((-1,1))
        y = Y[index,:].reshape((-1,1))
        a,z,b,y_pred = forward(x,W1,W2,b1,b2) # feed forward

        cost = -y.dot(np.log(y_pred)) - (1-y).dot(np.log(1-y_pred)) # calculate cost
        epoch_cost += cost # calculate cumulative cost of this epoch

    if turn == 0 or epoch_cost < epoch_cost_best:
        epoch_cost_best = epoch_cost
        W1_best = W1
        b1_best = b1
        W2_best = W2
        b2_best = b2

    epsilon = 0.12 # perturb all weighs between -0.12 ~ 0.12
    W1 = W1_best + np.random.random((2,2)) * epsilon * 2 - epsilon
    b1 = b1_best + np.random.random((2,1)) * epsilon * 2 - epsilon
    W2 = W2_best + np.random.random((1,2)) * epsilon * 2 - epsilon
    b2 = b2_best + np.random.random((1,1)) * epsilon * 2 - epsilon

    print('cost:',epoch_cost)


print('prediction\n',pred(X,W1_best,W2_best,b1_best,b2_best))
print('ground-truth\n',Y)

【讨论】：

太棒了！我以前从未见过这种策略。谢谢。但是 pred() 有一个 bug，因为我没有初始化 x。