如何在不同的 RNN 单元之间共享权重，这些单元在 Tensorflow 中输入不同的输入？答案

【问题标题】：How do I share weights across different RNN cells that feed in different inputs in Tensorflow?如何在不同的 RNN 单元之间共享权重，这些单元在 Tensorflow 中输入不同的输入？
【发布时间】：2017-11-29 20:28:48
【问题描述】：

我很好奇是否有一种好方法可以在不同的 RNN 单元之间共享权重，同时仍然为每个单元提供不同的输入。

我正在尝试构建的图表是这样的：

其中有三个橙色的 LSTM 单元并行运行，我想在它们之间共享权重。

我已经设法使用占位符实现了类似于我想要的东西（代码见下文）。但是，使用占位符会破坏优化器的梯度计算，并且不会训练任何超过我使用占位符的点。是否有可能在 TensorFlow 中以更好的方式做到这一点？

我在 Windows 7 的 Anaconda 环境中使用 Tensorflow 1.2 和 python 3.5。

代码：

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

当训练小批量分两步计算时：

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

感谢您的帮助。任何想法将不胜感激。

【问题讨论】：

为什么你有两个 feed_dict ？
第二个与第一个相同，但包含第一个“sess.run”结果的“RNNinput”。这就是我将具有共享 RNN 单元的下层的输出传递到上层的方式。我在第二个 'sess.run' 调用中使用占位符 'cls.input_place' 来执行此操作。不幸的是，这破坏了 tensorflow 的反向传播计算。
你不应该那样做。您可以像链接中提到的那样构建一个图表，一次性输入输入，然后让整个网络进行训练。有什么原因，为什么你不能这样做？
因为中间层中每个共享权重的 RNN 单元需要不同的输入来创建 3 个不同的输出，这些输出连接在一起然后输入到最后一层。为了分享权重，我不得不使用[cls.rnn_lower_model(cls.input_place)]*2。如果 input_place 只是图中的一个节点，我无法改变同一共享单元的不同实例的输入。

标签： tensorflow deep-learning recurrent-neural-network

【解决方案1】：

您有 3 个不同的输入：input_1, input_2, input_3 将其馈送到共享参数的 LSTM 模型。然后连接 3 个 lstm 的输出并将其传递给最终的 LSTM 层。代码应如下所示：

 # Create input placeholder for the network
 input_1 = tf.placeholder(...)
 input_2 = tf.placeholder(...)
 input_3 = tf.placeholder(...)

 # create a shared rnn layer 
 def shared_rnn(...):
    ...
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)

 # generate the outputs for each input
 with tf.variable_scope('lower_lstm') as scope:
    out_input_1 = shared_rnn(...)
    scope.reuse_variables() # the variables will be reused.
    out_input_2 = shared_rnn(...)
     scope.reuse_variables()
    out_input_3 = shared_rnn(...)

 # verify whether the variables are reused
 for v in tf.global_variables():
    print(v.name)

 # concat the three outputs
 output = tf.concat...  

 # Pass it to the final_lstm layer and out the logits
 logits = final_layer(output, ...)

 train_op = ...

 # train
   sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}

【讨论】：

谢谢。这更像是我想做的。

【解决方案2】：

我最终重新考虑了我的架构，并提出了一个更可行的解决方案。

我没有复制 LSTM 单元的中间层来创建三个具有相同权重的不同单元，而是选择运行同一个单元 3 次。每次运行的结果都存储在像 tf.Variable 这样的“缓冲区”中，然后将整个变量用作最终 LSTM 层的输入。 I drew a diagram here

以这种方式实现它允许在 3 个时间步后得到有效输出，并且不会破坏 tensorflows 反向传播算法（即 ANN 中的节点仍然可以训练。）

唯一棘手的事情是确保缓冲区对于最终 RNN 的顺序正确。

【讨论】：