Tensorflow 中 GRU 单元的解释？答案

【问题标题】：Explanation of GRU cell in Tensorflow?Tensorflow 中 GRU 单元的解释？
【发布时间】：2016-08-01 06:47:01
【问题描述】：

以下 Tensorflow 的 GRUCell 单元的代码显示了获取更新隐藏状态的典型操作，当先前的隐藏状态与序列中的当前输入一起提供时。

  def __call__(self, inputs, state, scope=None):
    """Gated recurrent unit (GRU) with nunits cells."""
    with vs.variable_scope(scope or type(self).__name__):  # "GRUCell"
      with vs.variable_scope("Gates"):  # Reset gate and update gate.
        # We start with bias of 1.0 to not reset and not update.
        r, u = array_ops.split(1, 2, _linear([inputs, state],
                                             2 * self._num_units, True, 1.0))
        r, u = sigmoid(r), sigmoid(u)
      with vs.variable_scope("Candidate"):
        c = self._activation(_linear([inputs, r * state],
                                     self._num_units, True))
      new_h = u * state + (1 - u) * c
return new_h, new_h

但我在这里看不到任何weights 和biases。例如我的理解是，获得r 和u 需要将权重和偏差与当前输入和/或隐藏状态相乘才能获得更新的隐藏状态。

我写了一个gru单元如下：

def gru_unit(previous_hidden_state, x):
    r  = tf.sigmoid(tf.matmul(x, Wr) + br)
    z  = tf.sigmoid(tf.matmul(x, Wz) + bz)
    h_ = tf.tanh(tf.matmul(x, Wx) + tf.matmul(previous_hidden_state, Wh) * r)
    current_hidden_state = tf.mul((1 - z), h_) + tf.mul(previous_hidden_state, z)
    return current_hidden_state

在这里，我明确使用权重Wx, Wr, Wz, Wh 和偏差br, bh, bz 等来获取更新的隐藏状态。这些权重和偏差是训练后学习/调整的。

如何利用 Tensorflow 的内置 GRUCell 来实现与上述相同的结果？

【问题讨论】：

它们将r 和z 门连接起来，一次性完成所有操作，节省计算量。

标签： neural-network tensorflow recurrent-neural-network gated-recurrent-unit

【解决方案1】：

它们在那里，您只是在该代码中看不到它们，因为 _linear 函数添加了权重和偏差。

r, u = array_ops.split(1, 2, _linear([inputs, state],
                                             2 * self._num_units, True, 1.0))

...

def _linear(args, output_size, bias, bias_start=0.0, scope=None):
  """Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.

  Args:
    args: a 2D Tensor or a list of 2D, batch x n, Tensors.
    output_size: int, second dimension of W[i].
    bias: boolean, whether to add a bias term or not.
    bias_start: starting value to initialize the bias; 0 by default.
    scope: VariableScope for the created subgraph; defaults to "Linear".

  Returns:
    A 2D Tensor with shape [batch x output_size] equal to
    sum_i(args[i] * W[i]), where W[i]s are newly created matrices.

  Raises:
    ValueError: if some of the arguments has unspecified or wrong shape.
  """
  if args is None or (nest.is_sequence(args) and not args):
    raise ValueError("`args` must be specified")
  if not nest.is_sequence(args):
    args = [args]

  # Calculate the total size of arguments on dimension 1.
  total_arg_size = 0
  shapes = [a.get_shape().as_list() for a in args]
  for shape in shapes:
    if len(shape) != 2:
      raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
    if not shape[1]:
      raise ValueError("Linear expects shape[1] of arguments: %s" % str(shapes))
    else:
      total_arg_size += shape[1]

  # Now the computation.
  with vs.variable_scope(scope or "Linear"):
    matrix = vs.get_variable("Matrix", [total_arg_size, output_size])
    if len(args) == 1:
      res = math_ops.matmul(args[0], matrix)
    else:
      res = math_ops.matmul(array_ops.concat(1, args), matrix)
    if not bias:
      return res
    bias_term = vs.get_variable(
        "Bias", [output_size],
        initializer=init_ops.constant_initializer(bias_start))
  return res + bias_term

【讨论】：

因此，权重和偏差似乎是按需创建的，并且与 get_variable 跨时间步共享，如果在相同的变量范围内调用，它将返回相同的东西。它不是不过，我清楚权重矩阵是如何初始化的。
我认为它是使用当前变量范围的默认初始化程序进行初始化的。
我认为这也回答了我的其他question 关于 tensorflow rnns 的问题。