Tensorflow交叉熵NaN，改变学习率似乎没有影响答案

【问题标题】：Tensorflow cross-entropy NaN, and changing learning rate doesn't seem to have an impactTensorflow交叉熵NaN，改变学习率似乎没有影响
【发布时间】：2016-05-05 19:35:16
【问题描述】：

TL;DR 尝试使用 tensorflow 构建用于序列标记的双向 RNN。

目标是接受输入“我喜欢纽约”并产生输出“O O LOC_START LOC”

图编译并运行，但是经过 1 或 2 个批次后损失变为 NaN。我知道这可能是学习率的问题，但改变学习率似乎没有影响。目前使用 AdamOptimizer。

任何帮助将不胜感激。

这是我的代码：

代码：

    # The input and output: a sequence of words, embedded, and a sequence of word classifications, one-hot
    self.input_x = tf.placeholder(tf.float32, [None, n_sequence_length, n_embedding_dim], name="input_x")
    self.input_y = tf.placeholder(tf.float32, [None, n_sequence_length, n_output_classes], name="input_y")

    # New shape: [sequence_length, batch_size (None), embedding_dim]
    inputs = tf.transpose(self.input_x, [1, 0, 2])

    # New shape: [sequence_length * batch_size (None), embedding_dim]
    inputs = tf.reshape(inputs, [-1, n_embedding_dim])

    # Define weights
    w_hidden = tf.Variable(tf.random_normal([n_embedding_dim, 2 * n_hidden_states]))
    b_hidden = tf.Variable(tf.random_normal([2 * n_hidden_states]))

    w_out = tf.Variable(tf.random_normal([2 * n_hidden_states, n_output_classes]))
    b_out = tf.Variable(tf.random_normal([n_output_classes]))

    # Linear activation for the input; this will make it fit to the hidden size
    inputs = tf.nn.xw_plus_b(inputs, w_hidden, b_hidden)

    # Split up the batches into a Python list
    inputs = tf.split(0, n_sequence_length, inputs)

    # Now we define our cell. It takes one word as input, a vector of embedding_size length
  cell_forward = rnn_cell.BasicLSTMCell(n_hidden_states, forget_bias=0.0)
  cell_backward = rnn_cell.BasicLSTMCell(n_hidden_states, forget_bias=0.0)

  # And we add a Dropout Wrapper as appropriate
  if is_training and prob_keep < 1:
        cell_forward = rnn_cell.DropoutWrapper(cell_forward, output_keep_prob=prob_keep)
        cell_backward = rnn_cell.DropoutWrapper(cell_backward, output_keep_prob=prob_keep)

    # And we make it a few layers deep
    cell_forward_multi = rnn_cell.MultiRNNCell([cell_forward] * n_layers)
    cell_backward_multi = rnn_cell.MultiRNNCell([cell_backward] * n_layers)

    # returns outputs = a list T of tensors [batch, 2*hidden]
    outputs = rnn.bidirectional_rnn(cell_forward_multi, cell_backward_multi, inputs, dtype=dtypes.float32)

    # [sequence, batch, 2*hidden]
    outputs = tf.pack(outputs)

    # [batch, sequence, 2*hidden]
    outputs = tf.transpose(outputs, [1, 0, 2])

    # [batch * sequence, 2 * hidden]
    outputs = tf.reshape(outputs, [-1, 2 * n_hidden_states])

    # [batch * sequence, output_classes]
    self.scores = tf.nn.xw_plus_b(outputs, w_out, b_out)

    # [batch * sequence, output_classes]
    inputs_y = tf.reshape(self.input_y, [-1, n_output_classes])

    # [batch * sequence]
    self.predictions = tf.argmax(self.scores, 1, name="predictions")

    # Now calculate the cross-entropy
    losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, inputs_y)
    self.loss = tf.reduce_mean(losses, name="loss")

    if not is_training:
        return

    # Training
    self.train_op = tf.train.AdamOptimizer(1e-4).minimize(self.loss)

    # Evaluate model
    correct_pred = tf.equal(self.predictions, tf.argmax(inputs_y, 1))
    self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name="accuracy")

【问题讨论】：

如果您为数据中存在的标签生成 0 分，您将获得 NaN，一种解决方法是在您的分数中添加一个小 epsilon
谢谢。你是否建议我修改这个： self.scores = tf.nn.xw_plus_b(outputs, w_out, b_out) 像这样 self.scores = tf.nn.xw_plus_b(outputs, w_out, b_out) + epilson_matrix
在添加 tf.constant(0.001, shape=[n_output_classes]) 作为我的 epsilon 后立即运行它...应该只需要一分钟即可中断，或者不需要。 :-) 我们拭目以待。
仍然是带有 epsilon 的 NaN。
@maga 很抱歉耽搁了这么长时间。不，我从来没有让它完全工作。我的结果参差不齐，可能是由于训练数据不足。

标签： machine-learning tensorflow recurrent-neural-network

【解决方案1】：

在训练数据中是否存在标签有问题的示例？然后，当它遇到该示例时，成本变为 NaN。我建议这样做是因为当学习率为零且仅经过几批之后，它似乎仍然会发生。

这是我的调试方式：

将批量大小设置为 1
将学习率设置为 0.0
当您运行批处理时，TensorFlow 会输出中间值而不仅仅是成本
一直运行直到获得 NaN，然后检查输入是什么，并通过检查中间输出确定在哪个点有 NaN

【讨论】：

谢谢亚伦，我会试一试的。我确实使用老旧的 PyCharm 调试器彻底检查了训练数据，一切似乎都井井有条。我还创建了一个由 NumPy 随机生成的训练数据小得多的“虚拟”模型。仍然得到 NaN 错误。我强烈怀疑模型本身有什么不妥之处，但我想不通。