【问题标题】:Simple ANN model converges with tanh(x) as the activation function, but it doesn't with leaky ReLu简单的 ANN 模型以 tanh(x) 作为激活函数收敛,但它不与leaky ReLu
【发布时间】:2020-04-26 02:06:02
【问题描述】:

我正在训练一个简单的 ANN 模型 (MLP),使用 tanh(x) 作为激活函数,经过一些交互后,它以等于 10^-5 的误差收敛,这是我的完整代码:

import numpy as np
import pandas as pd

# Base de dados a ser treinada
x = pd.DataFrame(
    [[1],
    [2],
    [3]],
    columns=['valores x'])

d = pd.DataFrame(
    [[5],
    [4],
    [3]],
    columns=['valores desejados'])

# Convertendo o dataframe em array e normalizando os valores desejados para ficar entre 0 e +1.
x = x.to_numpy()
d = d/(1.05*d.max())
d = d.to_numpy()


# Derivada de tanh(x) = sech²(x) = 1 - (tanh(x))²
def df(x):
    y = 1 - np.power(np.tanh(x), 2)
    return y

#def rede_mlp(n, x, d, net, k, precisao):

# Construindo a rede de duas camadas 
# net = número de neurônios na primeira camada
# n = taxa de aprendizagem
# precisao = precisão do erro quadrático médio
net=3
n = 0.1
precisao=0.00001
w1 = np.random.rand(net,len(x[0]))
w2 = np.random.rand(1,net)
E_M=1
epocas=0

while E_M>precisao:
    E_M=0
    errofinal=0
    for i in range(0,len(x)):

        # FOWARD
        i1 = np.matmul(w1, x[i].reshape(len(x[i]),1))
        y1 = np.tanh(i1)

        i2 = np.matmul(w2, y1)
        y2 = np.tanh(i2)

        # erro com o valor desejado
        erro = d[i].reshape(len(d[i]),1) - y2

        # BACKPROPAGATION
        delta_2 = erro*df(i2)
        w2 = w2 + n*(np.matmul(delta_2, y1.reshape(1, net)))

        delta_1 = (np.matmul(w2.T, delta_2))*df(i1)
        w1 = w1 + n*(np.matmul(delta_1, x[i].reshape(1, len(x[i]))))

        errofinal = errofinal + 0.5*erro**2

    E_M = errofinal/len(x)
    epocas+=1
    print(E_M)

之后,我尝试将激活函数改为leaky ReLu,但没有收敛。我已经多次更改学习率n,但错误仍然很高。大约是 7.95,这对我的数据来说很大。这是我的尝试:

import numpy as np
import pandas as pd


# Base de dados a ser treinada
x = pd.DataFrame(
    [[1],
    [2],
    [3]],
    columns=['valores x'])

d = pd.DataFrame(
    [[5],
    [4],
    [3]],
    columns=['valores desejados'])

# Convertendo o dataframe em array e normalizando os valores desejados para ficar entre 0 e +1.
x = x.to_numpy()
d = d.to_numpy()


def df(x):
    x = np.array(x)
    x[x<=0] = 0.01
    x[x>0] = 1
    return x

def f(x):
    return(np.where(x > 0, x, x * 0.01))



#def rede_mlp(n, x, d, net, k, precisao):

# Construindo a rede de duas camadas 
# net = número de neurônios na primeira camada
# n = taxa de aprendizagem
# precisao = precisão do erro quadrático médio
net=3
n = 1e-4
precisao=0.0001
w1 = np.random.rand(net,len(x[0]))
w2 = np.random.rand(1,net)
E_M=20
epocas=0

while E_M>precisao:
    E_M=0
    errofinal=0
    for i in range(0,len(x)):

        # FOWARD
        i1 = np.matmul(w1, x[i].reshape(len(x[i]),1))
        y1 = f(i1)



        i2 = np.matmul(w2, y1)
        y2 = f(i2)


        # erro com o valor desejado
        erro = d[i].reshape(len(d[i]),1) - y2


        # BACKPROPAGATION
        delta_2 = erro*df(i2)
        w2 = w2 + n*(np.matmul(delta_2, y1.reshape(1, net)))


        delta_1 = (np.matmul(w2.T, delta_2))*df(i1)
        w1 = w1 + n*(np.matmul(delta_1, x[i].reshape(1, len(x[i]))))

        errofinal = errofinal + 0.5*erro**2

    #E_M = errofinal/len(x)
    E_M = errofinal
    epocas+=1
    print(E_M)

已编辑

经过一些修改,这是我的ReLu代码(但错误仍然很高~7.77):

import numpy as np
import pandas as pd


# Base de dados a ser treinada
x = pd.DataFrame(
    [[1],
    [2],
    [3]],
    columns=['valores x'])

d = pd.DataFrame(
    [[5],
    [4],
    [3]],
    columns=['valores desejados'])

# Convertendo o dataframe em array e normalizando os valores desejados para ficar entre 0 e +1.
x = x.to_numpy()
d = d.to_numpy()


def df(x):
    return(np.where(x <= 0, 0.01, 1))

def f(x):
    return(np.where(x > 0, x, x * 0.01))


#def rede_mlp(n, x, d, net, k, precisao):

# Construindo a rede de duas camadas 
# net = número de neurônios na primeira camada
# n = taxa de aprendizagem
# precisao = precisão do erro quadrático médio
net=3
n = 1e-3
precisao=0.1
w1 = np.random.rand(net,len(x[0]))
w2 = np.random.rand(1,net)
E_M=20
epocas=0

while E_M>precisao:
    E_M=0
    errofinal=0
    for i in range(0,len(x)):

        # FOWARD
        i1 = np.matmul(w1, x[i].reshape(len(x[i]),1))
        y1 = f(i1)


        i2 = np.matmul(w2, y1)
        y2 = f(i2)


        # erro com o valor desejado
        erro = d[i].reshape(len(d[i]),1) - y2


        # BACKPROPAGATION
        delta_2 = erro*df(i2)
        delta_1 = (np.matmul(w2.T, delta_2))*df(i1)

        w2 = w2 + n*(np.matmul(delta_2, y1.reshape(1, net)))
        w1 = w1 + n*(np.matmul(delta_1, x[i].reshape(1, len(x[i]))))


        errofinal = errofinal + 0.5*erro**2

    #E_M = errofinal/len(x)
    E_M = errofinal
    epocas+=1
    print(E_M)

【问题讨论】:

    标签: python numpy neural-network mlp relu


    【解决方案1】:

    您需要为网络添加一个偏差。

    您尝试建模的方程是y = 6 - x,如果您可以使用6 作为截距(偏差),这很简单,但我认为如果您不这样做实际上是不可能的。

    添加偏差后,许多函数更容易表示,这就是为什么包含一个是标准做法的原因。这个Q&A on the role of bias in NNs解释的比较透彻。

    我修改了你的代码以添加偏见,并遵循更典型的命名约定,它对我来说收敛了。

    net = 3
    n = 1e-3
    precisao = 0.0001 
    
    w1 = np.random.rand(net, len(x[0])) 
    bias1 = np.random.rand()
    
    w2 = np.random.rand(1, net) 
    bias2 = np.random.rand()
    
    E_M = 20 
    epocas = 0 
    
    while E_M > precisao: 
        E_M = 0 
        errofinal = 0 
        for i in range(0,len(x)): 
            a0 = x[i].reshape(-1, 1) 
            targ = d[i].reshape(-1, 1) 
    
            z1 = np.matmul(w1, a0) + bias1
            a1 = f(z1) 
    
            z2 = np.matmul(w2, a1) + bias2
            a2 = f(z2) 
    
            erro = a2 - targ
    
            # BACKPROPAGATION 
            delta_2 = erro * df(z2) 
            delta_1 = np.matmul(w2.T, delta_2) * df(z1) 
            bias2 -= n * delta_2
            bias1 -= n * delta_1
            w2 -= n * np.matmul(delta_2, a1.T)
            w1 -= n * np.matmul(delta_1, a0.T)
    
            errofinal = errofinal + 0.5*erro**2 
    
        #E_M = errofinal/len(x) 
        E_M = errofinal 
        epocas += 1 
        if epocas % 1000 == 0:
            print(epocas, E_M) 
    

    我提高了学习率,使其收敛更快。

    1000 [[0.14401507]]
    2000 [[0.00028834]]
    

    早期的错误修复建议

    您将导数设置为始终等于 1。

    def df(x):
        x = np.array(x)
        x[x<=0] = 0.01
        x[x>0] = 1
        return x
    

    x[x&lt;=0] = 0.01 行将所有非正值设置为正值1/100。之后,每个值都是正的,因为已经为正的值不受影响,而负值或零值刚刚变为正值。所以下一行x[x&gt;0] = 1 将所有导数设置为1

    试试这个:

    def df(x):
        return np.where(np.array(x) <= 0, 0.01, 1)
    

    【讨论】:

    • 谢谢,但不知道是不是个别问题。错误仍然太高(7.90)。您在代码中看到其他问题吗?
    • 是的。在计算将 w1delta_1 = (np.matmul(w2.T, delta_2))*df(i1) 一起使用的更新之前,您正在使用 w2 = w2 + n*(np.matmul(delta_2, y1.reshape(1, net))) 更新 w2
    • 是的,但我更改了这一行(这是不正确的)但问题仍然存在。对不起,我不明白你的最后评论。 w2 应该在 w1 之前更新。
    • 不,不应该。所有权重应同时更新。
    • 对不起,你是对的。我更改了delta_1 的位置,现在错误在7.7792246。这段代码有没有更好的错误?
    猜你喜欢
    • 2018-04-24
    • 2018-08-08
    • 2021-08-31
    • 2020-12-05
    • 2020-02-02
    • 1970-01-01
    • 2014-08-08
    • 2020-10-20
    • 2015-11-09
    相关资源
    最近更新 更多