【问题标题】:Trying to understand deep RNN weights试图理解深度 RNN 权重
【发布时间】:2020-11-03 08:35:51
【问题描述】:

我试图了解 RNN 训练了哪些权重。对于具有 1 层的简单 RNN,它很容易理解。例如,如果时间步长的输入形状是 [50, 3],则每个特征有 3 个权重要训练,加上权重的偏差和输入状态的权重。但我很难理解随着 RNN 数量的增加,参数如何变为 12、21、32。感谢您的指导。

model = Sequential([
    SimpleRNN(1, return_sequences = False, input_shape = [50, 3]),    # 3 features and 1 per Wx and Wy
    Dense(1) 
])


model.summary()

model2 = Sequential([
    SimpleRNN(2, return_sequences = False, input_shape = [50, 3]),    
    Dense(1) # last do not neeed the return sequencies
])

model2.summary()


model3 = Sequential([
    SimpleRNN(3, return_sequences = False, input_shape = [50, 3]),   
    Dense(1) # last do not neeed the return sequencies
])

model3.summary()


model4 = Sequential([
    SimpleRNN(4, return_sequences = False, input_shape = [50, 3]),  
    Dense(1) # last do not neeed the return sequencies
])
Model: "sequential_20"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_22 (SimpleRNN)    (None, 1)                 5         
_________________________________________________________________
dense_18 (Dense)             (None, 1)                 2         
=================================================================
Total params: 7
Trainable params: 7
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_23 (SimpleRNN)    (None, 2)                 12        
_________________________________________________________________
dense_19 (Dense)             (None, 1)                 3         
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_22"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_24 (SimpleRNN)    (None, 3)                 21        
_________________________________________________________________
dense_20 (Dense)             (None, 1)                 4         
=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_25 (SimpleRNN)    (None, 4)                 32        
_________________________________________________________________
dense_21 (Dense)             (None, 1)                 5         
=================================================================
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________

【问题讨论】:

    标签: tensorflow keras deep-learning recurrent-neural-network


    【解决方案1】:

    对于您的模型 2:

    model2 = Sequential([
        SimpleRNN(2, return_sequences = False, input_shape = [50, 3]),    
        Dense(1) # last do not neeed the return sequencies
    ])
    

    下图显示了其中一个神经元的权重(5 个权重),您将有 1 个偏差。所以每个神经元有6个参数,总参数数为6*2 = 12。

    您示例的公式为:
    h * (3 + h) + h
    其中(3 + h) 是每个神经元的权重数量,最后一个h 将偏差添加到参数中

    【讨论】:

    • 非常感谢,现在说得通了。
    • 每个“神经元”是一个单独的 RNN 单元,每个单元处理一个输入,还是所有输入都连接成单个 RNN 单元的单个输入?我很困惑,网上的大多数解释都忽略了很多。
    • @Matt 我不清楚你的问题,但我希望这会有所帮助:RNN 是层而不是单个神经元。例如,在上图中,当前时间步(红色圆圈)内部是一层 RNN,它获取一层输入(蓝色圆圈)及其先前的状态(灰色圆圈)。实际上,灰色圆圈不是个别的东西。它们是最后一个时间步的红色圆圈