【问题标题】:Neural network XOR gate not learning神经网络 XOR 门不学习
【发布时间】:2016-11-28 09:47:16
【问题描述】:

我正在尝试使用 2 个感知器网络创建一个 XOR 门,但由于某种原因,网络没有学习,当我在图中绘制误差变化时,误差达到静态水平并在该区域振荡。

目前我没有对网络添加任何偏见。

import numpy as np

def S(x):
    return 1/(1+np.exp(-x))

win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15

# win = [[1,1], [2,2]]
# wout = [[1],[2]]

obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]

epoch = int(10000)
emajor = ""

for r in range(0,epoch):
    for xy in range(len(target)):
        tar = target[xy]
        fdata = obj[xy]

        fdata = S(np.dot(1,fdata))

        hnw = np.dot(fdata,win)

        hnw = S(np.dot(fdata,win))

        out = np.dot(hnw,wout)

        out = S(out)

        diff = tar-out

        E = 0.5 * np.power(diff,2)
        emajor += str(E[0]) + ",\n"

        delta_out = (out-tar)*(out*(1-out))
        nindelta_out = delta_out * eta

        wout_change = np.dot(nindelta_out[0], hnw)

        for x in range(len(wout_change)):
            change = wout_change[x]
            wout[x] -= change

        delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
        nindelta_in = eta * delta_in

        for x in range(len(nindelta_in)):
            midway = np.dot(nindelta_in[x][0], fdata)
            for y in range(len(win)):
                win[y][x] -= midway[y]



f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it

这是随学习轮数变化的误差。它是否正确?红色线是我期望错误应该如何变化的线。

我在代码中做错了什么?因为我似乎无法弄清楚导致错误的原因。非常感谢帮助。

提前致谢

【问题讨论】:

标签: python numpy machine-learning neural-network artificial-intelligence


【解决方案1】:

这是一个带有反向传播的隐藏层网络,可以对其进行定制,以运行带有 relu、sigmoid 和其他激活的实验。经过多次实验,得出的结论是,使用 relu 的网络性能更好,更快地达到收敛,而使用 sigmoid 的损失值波动。发生这种情况是因为“the gradient of sigmoids becomes increasingly small as the absolute value of x increases”。

import numpy as np
import matplotlib.pyplot as plt
from operator import xor

class neuralNetwork():
    def __init__(self):
        # Define hyperparameters
        self.noOfInputLayers = 2
        self.noOfOutputLayers = 1
        self.noOfHiddenLayerNeurons = 2

        # Define weights
        self.W1 = np.random.rand(self.noOfInputLayers,self.noOfHiddenLayerNeurons)
        self.W2 = np.random.rand(self.noOfHiddenLayerNeurons,self.noOfOutputLayers)

    def relu(self,z):
        return np.maximum(0,z)

    def sigmoid(self,z):
        return 1/(1+np.exp(-z))

    def forward (self,X):
        self.z2 = np.dot(X,self.W1)
        self.a2 = self.relu(self.z2)
        self.z3 = np.dot(self.a2,self.W2)
        yHat = self.relu(self.z3)
        return yHat

    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J

    def costFunctionPrime(self,X,y):
        # Compute derivative with respect to W1 and W2
        delta3 = np.multiply(-(y-self.yHat),self.sigmoid(self.z3))
        djw2 = np.dot(self.a2.T, delta3)
        delta2 = np.dot(delta3,self.W2.T)*self.sigmoid(self.z2)
        djw1 = np.dot(X.T,delta2)

        return djw1,djw2


if __name__ == "__main__":

    EPOCHS = 6000
    SCALAR = 0.01

    nn= neuralNetwork()    
    COST_LIST = []

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]

    for epoch in xrange(1,EPOCHS):
        cost = 0
        for i in inputs:
            X = i #inputs
            y = xor(X[0][0],X[0][1])
            cost += nn.costFunction(X,y)[0]
            djw1,djw2 = nn.costFunctionPrime(X,y)
            nn.W1 = nn.W1 - SCALAR*djw1
            nn.W2 = nn.W2 - SCALAR*djw2
        COST_LIST.append(cost)

    plt.plot(np.arange(1,EPOCHS),COST_LIST)
    plt.ylim(0,1)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title(str('Epochs: '+str(EPOCHS)+', Scalar: '+str(SCALAR)))
    plt.show()

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
    print "X\ty\ty_hat"
    for inp in inputs:
        print (inp[0][0],inp[0][1]),"\t",xor(inp[0][0],inp[0][1]),"\t",round(nn.forward(inp)[0][0],4)

最终结果:

X       y       y_hat
(0, 0)  0       0.0
(0, 1)  1       0.9997
(1, 0)  1       0.9997
(1, 1)  0       0.0005

训练后得到的权重为:

nn.w1

[ [-0.81781753  0.71323677]
  [ 0.48803631 -0.71286155] ]

nn.w2

[ [ 2.04849235]
  [ 1.40170791] ]

我发现以下 youtube 系列对理解神经网络非常有帮助:Neural networks demystified

我知道的很少,也可以在这个答案中解释。如果您想更好地了解神经网络,那么我建议您通过以下链接:cs231n: Modelling one neuron

【讨论】:

    【解决方案2】:

    每个时期计算的误差应该是所有平方和误差的总和(即每个目标的误差)

    import numpy as np
    def S(x):
        return 1/(1+np.exp(-x))
    win = np.random.randn(2,2)
    wout = np.random.randn(2,1)
    eta = 0.15
    # win = [[1,1], [2,2]]
    # wout = [[1],[2]]
    obj = [[0,0],[1,0],[0,1],[1,1]]
    target = [0,1,1,0]    
    epoch = int(10000)
    emajor = ""
    
    for r in range(0,epoch):
    
        # ***** initialize final error *****
        finalError = 0
    
        for xy in range(len(target)):
            tar = target[xy]
            fdata = obj[xy]
    
            fdata = S(np.dot(1,fdata))
    
            hnw = np.dot(fdata,win)
    
            hnw = S(np.dot(fdata,win))
    
            out = np.dot(hnw,wout)
    
            out = S(out)
    
            diff = tar-out
    
            E = 0.5 * np.power(diff,2)
    
            # ***** sum all errors *****
            finalError += E
    
            delta_out = (out-tar)*(out*(1-out))
            nindelta_out = delta_out * eta
    
            wout_change = np.dot(nindelta_out[0], hnw)
    
            for x in range(len(wout_change)):
                change = wout_change[x]
                wout[x] -= change
    
            delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
            nindelta_in = eta * delta_in
    
            for x in range(len(nindelta_in)):
                midway = np.dot(nindelta_in[x][0], fdata)
                for y in range(len(win)):
                    win[y][x] -= midway[y]
    
         # ***** Save final error *****
         emajor += str(finalError[0]) + ",\n"
    
    
    f = open('xor.csv','w')
    f.write(emajor) # python will convert \n to os.linesep
    f.close() # you can omit in most cases as the destructor will call it
    

    【讨论】:

    • 您好,感谢您的回答,但是当我绘制错误图时,每个图都不同,这是为什么呢?这可能吗?
    • 是的,这是因为使用了随机初始权重,并且每次启动程序时初始权重都会发生变化。有关更多信息,这里是一个很好的链接,可以更好地理解反向传播-mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
    • 谢谢,是的,我已经多次阅读该帖子以了解这件事,你不认为 delta_in = np.dot(hnw,(1-hnw)) * np. dot(delta_out[0], wout) 行不正确?我已经手动计算了,这条线的 outfrom 不是所需的,也许我在这里以错误的方式使用 numpy.dot 你不觉得吗?
    猜你喜欢
    • 2019-03-15
    • 1970-01-01
    • 2016-07-11
    • 2023-03-24
    • 2018-07-24
    • 1970-01-01
    • 2011-08-17
    • 2020-10-29
    • 2013-04-22
    相关资源
    最近更新 更多