在自定义 Tensorflow 单元中调用基本 LSTM 单元答案

【问题标题】：Calling a basic LSTM cell within a custom Tensorflow cell在自定义 Tensorflow 单元中调用基本 LSTM 单元
【发布时间】：2017-09-11 02:23:22
【问题描述】：

我正在尝试实现本文中的 MATCH LSTM：https://arxiv.org/pdf/1608.07905.pdf

我正在使用 Tensorflow。该架构的一部分是 RNN，它使用输入和先前的状态来计算注意力向量，然后将其应用于上下文，然后将结果与输入连接并将它们发送到 LSTM。为了构建这个 RNN 的第一部分，我编写了一个自定义单元格供 Tensorflow 调用。但我不确定如何将结果发送到 LSTM。是否可以在我正在编写的自定义单元格中调用基本 LSTM 单元格？我尝试了几种不同的方法，但在调用 LSTM 单元的行中不断收到错误“模块”对象没有属性“rnn_cell”。任何帮助将不胜感激！

编辑添加代码：

将 numpy 导入为 np 将张量流导入为 tf

类 MatchLSTMCell(tf.contrib.rnn.RNNCell)：

def __init__(self, state_size, question_tensor, encoded_questions, batch_size):
    self._state_size = state_size
    self.question_tensor = question_tensor
    self.encoded_questions = encoded_questions
    self.batch_size = batch_size

@property
def state_size(self):
    return self._state_size

@property
def output_size(self):
    return self._state_size

def __call__(self, inputs, state, scope=None):
    scope = scope or type(self).__name__

    with tf.variable_scope(scope):

        W_p = tf.get_variable("W_p", dtype=tf.float64, shape=[self.state_size, self.state_size], initializer=tf.contrib.layers.xavier_initializer())
        W_r = tf.get_variable("W_r", dtype=tf.float64, shape=[self.state_size, self.state_size], initializer=tf.contrib.layers.xavier_initializer())
        b_p = tf.get_variable("b_p", dtype=tf.float64, shape=[self.state_size])
        w = tf.get_variable("w", dtype=tf.float64, shape=[1,self.state_size])
        b = tf.get_variable("b", dtype=tf.float64, shape=[])

        #print 'question tensor', np.shape(self.question_tensor)
        #print 'inputs', np.shape(inputs)
        #print 'insides', np.shape(tf.matmul(inputs, W_p) + tf.matmul(state, W_r) + b_p)
        G = tf.nn.tanh(
                       tf.transpose(tf.transpose(self.question_tensor, perm=[1,0,2]) + 
                       (tf.matmul(inputs, W_p) + tf.matmul(state, W_r) + b_p), perm=[1,0,2])
                       )
        #print 'big G', np.shape(G)

        attention_list = []
        for i in range(self.batch_size):
            attention_matrix = tf.matmul(G[i,:,:], tf.transpose(w))
            attention_list.append(attention_matrix)
        attention_scores = tf.stack(attention_list)
        a = tf.nn.softmax(attention_scores + b)
        a = tf.reshape(a, [self.batch_size, -1])
        #print 'a shape is', np.shape(a)

        weighted_question_list = []
        for i in range(self.batch_size):
            attention_vector = tf.matmul(tf.reshape(a[i], [1,-1]), self.encoded_questions[i])
            weighted_question_list.append(attention_vector)
        weighted_questions = tf.stack(weighted_question_list)
        weighted_questions = tf.reshape(weighted_questions, [32, -1])
        #print'weighted questions', np.shape(weighted_questions)

        z = tf.concat([inputs, weighted_questions], 1)
        lstm_cell = tf.nn.rnn_cell.LSTMCell(self.state_size)
        output, new_state = lstm_cell.__call__(z, state)

    return output, new_state

【问题讨论】：

没有代码看这个很困难，我想帮忙。对您有帮助的是创建一个简约的测试程序，显示您的自定义 RNN 工作或不工作，以及您如何使用 LSTM 以及是否有效的另一个简约测试。这两个程序将帮助其他人（包括我）解决堆栈溢出问题，帮助您调试问题。
谢谢！我将开始编写这些程序。我将我的代码添加到帖子中，以防在我编写这些程序之前它可能会有所帮助。感谢您的帮助！

标签： tensorflow

【解决方案1】：

我也在尝试为 Squad 重新实现 Match_LSTM 以进行实验。我使用MurtyShikhar's 作为参考。有用！但是，他必须自定义 AttentionWrapper 并使用现有的 BasicLSTM 单元。

我还尝试通过将 z 和 state 作为 (inputs,state) 对放在 Basic_LSTM 中来创建 Match_LSTM_cell：

    def __call__(self, inputs,state):
        #c is not a output. c somehow is a "memory keeper".
        #Necessary to update and pass new_c through LSTM
        c,h=state

        #...Calculate your z
        #...inputs will be each tokens in context(passage) respectively
        #...Calculate alpha_Q
        z=tf.concat([inputs,alpha_Q],axis=1)

        ########This part is reimplement of Basic_LSTM
        with vs.variable_scope("LSTM_core"):
            sigmoid=math_ops.sigmoid
            concat=_linear([z,h],dimension*4,bias=True)
            i,j,f,o=array_ops.split(concat,num_or_size_splits=4,axis=1)
            new_c=(c*sigmoid(f+self._forget_bias)+sigmoid(i)*self._activation(j))

            new_h = self._activation(new_c) * sigmoid(o)
            new_state=(new_c,new_h)
        return new_h,new_state

【讨论】：