错误的维度 XOR 神经网络 python答案

【问题标题】：Wrong dimensions XOR neural network python错误的维度 XOR 神经网络 python
【发布时间】：2020-02-10 10:01:11
【问题描述】：

我正在尝试在 python 中构建一个具有一个隐藏层的 XOR 神经网络，但我遇到了维度问题，我无法弄清楚为什么我一开始就得到错误的维度，因为数学在我看来是正确的。

维度问题从反向传播部分开始并被评论。错误具体是

  File "nn.py", line 52, in <module>
    d_a1_d_W1 = inp * deriv_sigmoid(z1) 
  File "/usr/local/lib/python3.7/site-packages/numpy/matrixlib/defmatrix.py", line 220, in __mul__
    return N.dot(self, asmatrix(other))
ValueError: shapes (1,2) and (3,1) not aligned: 2 (dim 1) != 3 (dim 0)

另外，为什么这里的 sigmoid_derivative 函数只有在我转换为 numpy 数组时才有效？

代码：


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):

  fx = np.array(sigmoid(x)) # gives dimensions issues unless I cast to array
  return fx * (1 - fx)

hiddenNeurons = 3
outputNeurons = 1
inputNeurons = 2

X = np.array( [ [0, 1]  ])
elem = np.matrix(X[0])
elem_row, elem_col = elem.shape


y = np.matrix([1])

W1 = np.random.rand(hiddenNeurons, elem_col)
b1 = np.random.rand(hiddenNeurons, 1)
W2 = np.random.rand(outputNeurons, hiddenNeurons)
b2 = np.random.rand(outputNeurons, 1)
lr = .01



for inp, ytrue in zip(X, y):
    inp = np.matrix(inp)

    # feedforward
    z1 = W1 * inp.T + b1 # get weight matrix1 * inputs + bias1
    a1 = sigmoid(z1) # get activation of hidden layer

    z2 = W2 * a1 + b2 # get weight matrix2 * activated hidden + bias 2
    a2 = sigmoid(z2) # get activated output 
    ypred = a2 # and call it ypred (y prediction)

    # backprop
    d_L_d_ypred = -2 * (ytrue - ypred) # derivative of mean squared error loss

    d_ypred_d_W2 = a1 * deriv_sigmoid(z2) # deriviative of y prediction with respect to weight matrix 2
    d_ypred_d_b2 = deriv_sigmoid(z2) # deriviative of y prediction with respect to bias 2

    d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation

    d_a1_d_W1 = inp * deriv_sigmoid(z1) # dimensions issue starts here ––––––––––––––––––––––––––––––––

    d_a1_d_b1 = deriv_sigmoid(b1) 

    W1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_W1
    b1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_b1
    W2 -= lr * d_L_d_ypred * d_ypred_d_W2
    b2 -= lr * d_L_d_ypred * d_ypred_d_b2

【问题讨论】：

绝对有必要使用 numpy 矩阵吗？这可能不是问题的唯一原因，但普遍的共识似乎是ndarray is the better choice。 docs 状态：“不再建议使用此类，即使对于线性代数也是如此。而是使用常规数组。将来可能会删除该类。”
谢谢。我实际上已经尝试用 np.array 替换所有内容，但仍然遇到相同的错误。
好吧，我会试着看一下代码 :) 不过我对神经网络了解不多，所以不能保证！

标签： python numpy neural-network xor sigmoid

【解决方案1】：

我从未尝试过使用神经网络。所以我不完全理解你想要做什么。

如果 a & b 是矩阵，而不是 numpy 数组，我猜想a * b 的工作原理会有些混乱。在 numpy 数组 * 上进行元素乘法，在 np.matrices 上进行矩阵乘法。

a=np.array([[1,2],[3,4]])
b = a-1
print(b) 
# array([[0, 1],
#        [2, 3]])

a*b     # Element wise multiplication
# array([[ 0,  2],     [[ 1*0, 2*1 ], 
#        [ 6, 12]])     [ 3*2, 4*3 ]]

am = np.matrix(a)
bm = np.matrix(b)

am * bm  # Matrix (dot) multiplication
# matrix([[ 4,  7],    [[ 0*1+1*2, 1*1+2*3],
#         [ 8, 15]])    [ 1*2+2*3, 3*1+4*3]]

在 deriv_sigmoid 函数（没有 np.array）中，如果 x 是矩阵，则 fx 是具有相同形状 (3,1) 的矩阵。 fx * (1-fx) 当 fx 是 (3,1) 矩阵时会引发异常，因为两个 (3,1) 矩阵不能相乘。

同样的问题也适用于代码的“# backprop”部分。

d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation
# W2 * deriv_sigmoid(z2) fails as shapes are incompatible with matrix multiplication.    
# deriv_sigmoid(z2) * W2 would work, but I guess would return incorrect values (and shape).

d_a1_d_W1 = inp * deriv_sigmoid(z1)
# This fails for the same reason.  The shapes of ing and z1 are incompatible.

除非你需要矩阵乘法，否则我认为使用 np.arrays 将使编程更容易。

【讨论】：