Tensorflow RNN 权重矩阵初始化答案

【问题标题】：Tensorflow RNN weight matrices initializationTensorflow RNN 权重矩阵初始化
【发布时间】：2023-05-08 10:24:01
【问题描述】：

我正在使用 bidirectional_rnn 和 GRUCell，但这是关于 Tensorflow 中 RNN 的一般问题。

我找不到如何初始化权重矩阵（输入到隐藏，隐藏到隐藏）。它们是随机初始化的吗？归零？对于我创建的每个 LSTM，它们的初始化方式是否不同？

编辑：这个问题的另一个动机是预训练一些 LSTM 并在后续模型中使用它们的权重。如果不保存所有状态并恢复整个模型，我目前不知道该怎么做。

谢谢。

【问题讨论】：

标签： tensorflow recurrent-neural-network

【解决方案1】：

RNN 模型将使用get_variable 创建它们的变量，您可以通过使用variable_scope 和passing a default initializer to it 包装创建这些变量的代码来控制初始化。除非 RNN 明确指定一个（looking at the code，它没有），uniform_unit_scaling_initializer is used。

您还应该能够通过声明第二个模型并将reuse=True 传递给它的variable_scope 来共享模型权重。只要命名空间匹配，新模型将获得与第一个模型相同的变量。

【讨论】：

作用域中的所有变量怎么会有一个初始化器呢？我不应该为每个具有自己维度的权重矩阵提供一个初始化器吗？
get_variable 指定变量的形状（及其 dtype），并将其传递给初始化程序。规范是解耦的，但在某些情况下，初始化器肯定依赖于形状信息 (uniform_unit_scaling_initializer for example)。

【解决方案2】：

如何为 RNN 初始化权重矩阵？

我相信人们正在对 RNN 的权重矩阵使用随机正态初始化。查看example in TensorFlow GitHub Repo。由于笔记本有点长，他们有一个简单的 LSTM 模型，他们使用tf.truncated_normal 来初始化权重和tf.zeros 来初始化偏差（虽然我之前尝试过使用tf.ones 来初始化偏差，似乎也有效）。我相信标准差是一个你可以自己调整的超参数。有时权重初始化对梯度流很重要。虽然据我所知，LSTM 本身是为处理梯度消失问题而设计的（而梯度裁剪是为了解决梯度爆炸问题），所以也许你不需要超级小心设置std_dev 在 LSTM 中？我读过在卷积神经网络上下文中推荐 Xavier 初始化 (TF API doc for Xavier initializer) 的论文。我不知道人们是否在 RNN 中使用它，但我想如果你想看看它是否有帮助，你甚至可以在 RNN 中尝试这些。

现在跟进@Allen 的回答以及您在 cmets 中留下的后续问题。

如何控制变量范围的初始化？

使用我链接到的TensorFlow GitHub python notebook 中的简单 LSTM 模型作为示例。具体来说，如果我想使用变量范围控制重新分解上图中代码的 LSTM 部分，我可以编写如下代码......

import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
    '''initialize LSTMcell weights and biases, set variables to reuse mode'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    with tf.variable_scope('LSTMcell') as scope:
        for gate in gates:
            with tf.variable_scope(gate) as gate_scope:
                wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
                wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
                bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
                gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
        scope.reuse_variables()

def get_scope_variables(scope_name, variable_names):
    '''a helper function to fetch variable based on scope_name and variable_name'''
    vars = {}
    with tf.variable_scope(scope_name, reuse=True):
        for var_name in variable_names
            var = tf.get_variable(var_name)
            vars[var_name] = var
    return vars

def LSTMcell(i, o, state):
    '''a function for performing LSTMcell computation'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    var_names = ['wx', 'wt', 'bi']
    gate_comp = {}
    with tf.variable_scope('LSTMcell', reuse=True):
        for gate in gates:
            vars = get_scope_variables(gate, var_names)
            gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
    state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
    output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
    return output, state

重构代码的用法类似于以下...

initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)

尽管重构后的代码可能看起来不那么直接，但使用范围变量控制可确保范围封装并允许灵活的变量控制（至少在我看来）。

在预训练一些 LSTM 并在后续模型中使用它们的权重。如何在不保存所有状态并恢复整个模型的情况下做到这一点。

假设你有一个预训练模型冻结并加载，如果你想使用他们冻结的“wx”、“wt”和“bi”，你可以简单地找到它们的父范围名称和变量名称，然后获取变量在get_scope_variables func 中使用类似的结构。

with tf.variable_scope(scope_name, reuse=True):
    var = tf.get_variable(var_name)

这是understanding variable scope and sharing variables 的链接。我希望这是有帮助的。

【讨论】：

【解决方案3】：

使用特定初始化器初始化所有内核权重的简单方法是将初始化器留在tf.variable_scope() 中。例如：

with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()):
   basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
   outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

【讨论】：