【问题标题】:Structure of initial state for stacked LSTM堆叠 LSTM 的初始状态结构
【发布时间】:2019-04-19 17:18:47
【问题描述】:

使用tf.keras.layers.RNN API 在 TensorFlow (1.13.1) 中多层/堆叠 RNN 的初始状态所需的结构是什么?

我尝试了以下方法:

lstm_cell_sizes = [256, 256, 256]
lstm_cells = [tf.keras.layers.LSTMCell(size) for size in lstm_cell_sizes]

state_init = [tf.placeholder(tf.float32, shape=[None] + cell.state_size) for cell in lstm_cells]

tf.keras.layers.RNN(lstm_cells, ...)(inputs, initial_state=state_init)

这会导致:

ValueError: Could not pack sequence. Structure had 6 elements, but flat_sequence had 3 elements.  Structure: ([256, 256], [256, 256], [256, 256]), flat_sequence: [<tf.Tensor 'player/Placeholder:0' shape=(?, 256, 256) dtype=float32>, <tf.Tensor 'player/Placeholder_1:0' shape=(?, 256, 256) dtype=float32>, <tf.Tensor 'player/Placeholder_2:0' shape=(?, 256, 256) dtype=float32>].

如果我将 state_init 更改为形状为 [None, 256] 的张量扁平列表,我会得到:

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 256), ndim=2), InputSpec(shape=(None, 256), ndim=2), InputSpec(shape=(None, 256), ndim=2)]; however `cell.state_size` is [[256, 256], [256, 256], [256, 256]]

Tensorflow RNN docs 对此相当模糊:

"你可以通过符号指定RNN层的初始状态 使用关键字参数initial_state 调用它们。的价值 initial_state 应该是张量或张量列表表示 RNN 层的初始状态。”

【问题讨论】:

  • 我现在面临着完全相同的问题。一个解决方案会很好!

标签: tensorflow keras lstm recurrent-neural-network


【解决方案1】:

我相信你在 TF2 中的做法:

import tensorflow.compat.v2 as tf #If you have a newer version of TF1
#import tensorflow as tf          #If you have TF2

sentence_max_length = 5
batch_size = 3
n_hidden = 2
x = tf.constant(np.reshape(np.arange(30),(batch_size,sentence_max_length, n_hidden)), dtype = tf.float32)

stacked_lstm = tf.keras.layers.StackedRNNCells([tf.keras.layers.LSTMCell(128) for _ in range(2)])

lstm_layer = tf.keras.layers.RNN(stacked_lstm,return_state=False,return_sequences=False)

result = lstm_layer(x)
print(result)

【讨论】:

    【解决方案2】:

    在 TF2 中,tf.keras.layers.RNNtf.keras.layers.LSTMtf.keras.layers.GRU 的 API 显示这些层的 call() 函数都采用 initial_state= 参数。这是说明

    initial_state:要传递给单元的第一次调用的初始状态张量列表(可选,默认为None,这会导致创建零填充初始状态张量)。

    下面,是tf.keras.layers.StackedRNNCells的小例子的修改版,

    batch_size = 3
    sentence_max_length = 5
    n_hidden = 2  # number of hidden layers
    new_shape = (batch_size, sentence_max_length, n_hidden)
    x = tf.constant(np.reshape(np.arange(30), new_shape), dtype = tf.float32)
    
    rnn_cells = [tf.keras.layers.LSTMCell(128) for _ in range(n_hid_layers)]
    stacked_lstm = tf.keras.layers.StackedRNNCells(rnn_cells)
    
    # Return the state to serve the dual purpose of 
    #  1) verifying that the returned state is a list and 
    #  2) to re-use the returned state as the initial_state 
    #     in a subsequent call to the lstm_layer
    lstm_layer = tf.keras.layers.RNN(stacked_lstm,return_state=True,return_sequences=False)
    
    # In the first call, init_state=None which leads to using zero filled initial states
    # The call to lstm_layer returns a list. result[0] is the output of the lstm,
    #  result[1] and result[2] are the states of the 1st and 2nd layer of the
    #  lstm, respectively
    result = lstm_layer(x)  
    result2 = lstm_layer(result[0], initial_state=result[1:])
    

    这是我的 Pycharm python 控制台中变量的快照:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-05
      • 2016-12-13
      • 2021-07-11
      • 1970-01-01
      • 2018-10-30
      • 1970-01-01
      相关资源
      最近更新 更多