【发布时间】:2017-09-13 10:08:16
【问题描述】:
我正在 Python 3.4 中使用 numpy 和矩阵构建神经网络的草图,以学习简单的 XOR。 我的符号如下:
a 是一个神经元的活动
z 是一个神经元的输入
W 是一个权重矩阵,大小为 R^{#前一层神经元数}x{#下一层神经元数}
B 是偏差值的向量
在 python 中实现了一个非常简单的网络后,在仅对单个输入向量进行训练时一切正常。然而,在对所有四个异或训练示例进行训练时,误差函数表现出非常奇怪的行为(见图),网络的输出总是大约为 0.5。 更改网络大小、学习率或训练时期似乎没有帮助。
这是网络的代码:
import numpy as np
import time
import matplotlib.pyplot as plt
Js = []
start = time.time()
np.random.seed(2)
#Sigmoid
def activation(x, derivative = False):
if(derivative):
a = activation(x)
return a * (1 - a)
else:
return 1/(1+np.exp(-x))
def cost(output, target):
return (1/2) * np.sum((target - output)**2)
INPUTS = np.array([
[0, 1],
[1, 0],
[0, 0],
[1, 1],
])
TARGET = np.array([
[1],
[1],
[0],
[0],
])
"Hyper-Parameters"
# Layer Structure
LAYER = [2, 3, 1]
LEARNING_RATE = 0.1
ITERATIONS = int(1e3)
# Init Weights
W1 = np.random.rand(LAYER[0], LAYER[1])
W2 = np.random.rand(LAYER[1], LAYER[2])
# Init Biases
B1 = np.random.rand(LAYER[1], 1)
B2 = np.random.rand(LAYER[2], 1)
for i in range(0, ITERATIONS):
exampleIndex = i % len(INPUTS)
#exampleIndex = 2
"Forward Pass"
# Layer One Activity (Input layer)
A0 = np.transpose(INPUTS[exampleIndex:exampleIndex+1])
# Layer Two Activity (Hidden Layer)
Z1 = np.dot(np.transpose(W1), A0) + B1
A1 = activation(Z1)
# Layer Three Activity (Output Layer)
Z2 = np.dot(np.transpose(W2), A1) + B2
A2 = activation(Z2)
# Output
O = A2
# Cost J
# Target Vector T
T = np.transpose(TARGET[exampleIndex:exampleIndex+1])
J = cost(O, T)
Js.append(J)
print("J = {}".format(J))
print("I = {}, O = {}".format(A0, O))
"Backward Pass"
# Calculate Delta of output layer
D2 = (O - T) * activation(Z2, True)
# Calculate Delta of hidden layer
D1 = np.dot(W2, D2) * activation(Z1, True)
# Calculate Derivatives w.r.t. W2
DerW2 = np.dot(A1, np.transpose(D2))
# Calculate Derivatives w.r.t. W1
DerW1 = np.dot(A0, np.transpose(D1))
# Calculate Derivatives w.r.t. B2
DerB2 = D2
# Calculate Derivatives w.r.t. B1
DerB1 = D1
"Update Weights and Biases"
W1 -= LEARNING_RATE * DerW1
B1 -= LEARNING_RATE * DerB1
W2 -= LEARNING_RATE * DerW2
B2 -= LEARNING_RATE * DerB2
# Show prediction
print("Time elapsed {}s".format(time.time() - start))
plt.plot(Js)
plt.ylabel("Cost J")
plt.xlabel("Iterations")
plt.show()
我的实现中出现这种奇怪行为的原因可能是什么?
【问题讨论】:
标签: python numpy machine-learning neural-network artificial-intelligence