使用 tensorflow 缓冲欠载和 ResourceExhausted 错误答案

【问题标题】：Buffer underrun and ResourceExhausted errors with tensorflow使用 tensorflow 缓冲欠载和 ResourceExhausted 错误
【发布时间】：2017-05-21 16:54:30
【问题描述】：

我正在上高中，我正在尝试做一个涉及神经网络的项目。我正在使用 Ubuntu 并尝试使用 tensorflow 进行强化学习，但是当我训练神经网络时，我总是收到很多欠载警告。它们采用ALSA lib pcm.c:7963:(snd_pcm_recover) underrun occurred 的形式。随着训练的进行，此消息会越来越频繁地打印到屏幕上。最终，我得到一个 ResourceExhaustedError 并且程序终止。以下是完整的错误信息：

W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[320000,512]
Traceback (most recent call last):
  File "./train.py", line 121, in <module>
    loss, _ = model.train(minibatch, gamma, sess) # Train the model based on the batch, the discount factor, and the tensorflow session.
  File "/home/perrin/neural/dqn.py", line 174, in train
    return sess.run([self.loss, self.optimize], feed_dict=self.feed_dict) # Runs the training.  This is where the underrun errors happen
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[320000,512]
     [[Node: gradients/fully_connected/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](dropout/mul, gradients/fully_connected/BiasAdd_grad/tuple/control_dependency)]]

Caused by op u'gradients/fully_connected/MatMul_grad/MatMul_1', defined at:
  File "./train.py", line 72, in <module>
    model = AC_Net([None, 201, 201, 3], 5, trainer) # This creates the neural network using the imported AC_Net class.
  File "/home/perrin/neural/dqn.py", line 128, in __init__
    self.optimize = trainer.minimize(self.loss) # This tells the trainer to adjust the weights in such a way as to minimize the loss.  This is what actually
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 269, in minimize
    grad_loss=grad_loss)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 335, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients
    in_grads = grad_fn(op, *out_grads)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 731, in _MatMulGrad
    math_ops.matmul(op.inputs[0], grad, transpose_a=True))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

...which was originally created as op u'fully_connected/MatMul', defined at:
  File "./train.py", line 72, in <module>
    model = AC_Net([None, 201, 201, 3], 5, trainer) # This creates the neural network using the imported AC_Net class.
  File "/home/perrin/neural/dqn.py", line 63, in __init__
    net = slim.fully_connected(net, 512, activation_fn=tf.nn.elu, scope='fully_connected') # Feeds the input through a fully connected layer
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1350, in fully_connected
    outputs = standard_ops.matmul(inputs, weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[320000,512]
     [[Node: gradients/fully_connected/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](dropout/mul, gradients/fully_connected/BiasAdd_grad/tuple/control_dependency)]]

我研究了这些问题，但并不清楚如何解决这些问题。我对编程很陌生，所以我不太了解缓冲区和数据读取/写入的工作原理。我对这些错误感到非常困惑。有谁知道我的代码的哪些部分可能导致此问题以及如何解决？感谢您花时间考虑这个问题！

这是我定义神经网络的代码（基于this tutorial）：

#! /usr/bin/python

import numpy as np
import tensorflow as tf
slim = tf.contrib.slim

# The neural network
class AC_Net:
    # This defines the actual neural network.
    # output_size:  the number of outputs of the policy
    # trainer:  the tensorflow training optimizer used by the network
    def __init__(self, input_shape, output_size, trainer):

        with tf.name_scope('input'):
            self.input = tf.placeholder(shape=list(input_shape), dtype=tf.float32, name='input')
            net = tf.image.per_image_standardization(self.input[0])
            net = tf.expand_dims(net, [0])

        with tf.name_scope('convolution'):
            net = slim.conv2d(net, 32, [8, 8], activation_fn=tf.nn.elu, scope='conv')
            net = slim.max_pool2d(net, [2, 2], scope='pool')

        net = slim.flatten(net)
        net = tf.nn.dropout(net, .5)
        net = slim.fully_connected(net, 512, activation_fn=tf.nn.elu, scope='fully_connected')
        net = tf.nn.dropout(net, .5)

        with tf.name_scope('LSTM'):
            cell = tf.nn.rnn_cell.BasicLSTMCell(256, state_is_tuple=True, activation=tf.nn.elu)

            with tf.name_scope('state_in'):
                state_in = cell.zero_state(tf.shape(net)[0], tf.float32)

            net = tf.expand_dims(net, [0])  
            step_size = tf.shape(self.input)[:1]
            output, state = tf.nn.dynamic_rnn(cell, net, initial_state=state_in, sequence_length=step_size, time_major=False, scope='LSTM')

        out = tf.reshape(output, [-1, 256])
        out = tf.nn.dropout(out, .5)
        self.policy = slim.fully_connected(out, output_size, activation_fn=tf.nn.softmax, scope='policy')

        self.value = slim.fully_connected(out, 1, activation_fn=None, scope='value')

        # Defines the loss functions
        with tf.name_scope('loss_function'):
            self.target_values = tf.placeholder(dtype=tf.float32, name='target_values') # The target value is the discounted reward.
            self.actions = tf.placeholder(dtype=tf.int32, name='actions') # This is the network's policy.
            # The advantage is the difference between what the network thought the value of an action was, and what it actually was.
            # It is computed as R - V(s), where R is the discounted reward and V(s) is the value of being in state s.   
            self.advantages = tf.placeholder(dtype=tf.float32, name='advantages') 

            with tf.name_scope('entropy'):
                entropy = -tf.reduce_sum(tf.log(self.policy + 1e-10) * self.policy)
            with tf.name_scope('responsible_actions'):
                actions_onehot = tf.one_hot(self.actions, output_size, dtype=tf.float32)    
                responsible_actions = tf.reduce_sum(self.policy * actions_onehot, [1]) # This returns only the actions that were selected. 

            with tf.name_scope('loss'):

                with tf.name_scope('value_loss'):
                    self.value_loss = tf.reduce_sum(tf.square(self.target_values - tf.reshape(self.value, [-1])))

                with tf.name_scope('policy_loss'):
                    self.policy_loss = -tf.reduce_sum(tf.log(responsible_actions + 1e-10) * self.advantages)

                with tf.name_scope('total_loss'):
                    self.loss = self.value_loss + self.policy_loss - entropy * .01

                tf.summary.scalar('loss', self.loss)

        with tf.name_scope('gradient_clipping'):
            tvars = tf.trainable_variables()
            grads = tf.gradients(self.loss, tvars)          
            grads, _ = tf.clip_by_global_norm(grads, 20.)
        self.optimize = trainer.apply_gradients(zip(grads, tvars))

    def predict(self, inputs, sess):
        return sess.run([self.policy, self.value], feed_dict={self.input:inputs})

    def train(self, train_batch, gamma, sess):

        inputs = train_batch[:, 0]
        actions = train_batch[:, 1]
        rewards = train_batch[:, 2]
        values = train_batch[:, 4]

        discounted_rewards = rewards[::-1]
        for i, j in enumerate(discounted_rewards):
            if i > 0:
                discounted_rewards[i] += discounted_rewards[i - 1] * gamma
        discounted_rewards = np.array(discounted_rewards, np.float32)[::-1] 
        advantages = discounted_rewards - values 
        self.feed_dict = {
                self.input:np.vstack(inputs), 
                self.target_values:discounted_rewards, 
                self.actions:actions,
                self.advantages:advantages
                }
        return sess.run([self.loss, self.optimize], feed_dict=self.feed_dict)

这是我训练神经网络的代码：

#! /usr/bin/python

import game_env, move_right, move_right_with_obs, random, inspect, os
import tensorflow as tf
import numpy as np
from dqn import AC_Net

def process_outputs(x):
    a = [int(x > 2), int(x%2 == 0 and x > 0)*2-int(x > 0)]  
    return a

environment = game_env # The environment to use
env_name = str(inspect.getmodule(environment).__name__) # The name of the environment

ep_length = 2000
num_episodes = 20

total_steps = ep_length * num_episodes # The total number of steps
model_path = '/home/perrin/neural/nn/' + env_name

learning_rate = 1e-4 # The learning rate
trainer = tf.train.AdamOptimizer(learning_rate=learning_rate) # The gradient descent optimizer used
first_epsilon = 0.6 # The initial chance of random action
final_epsilon = 0.01 # The final chance of random action
gamma = 0.9
anneal_steps = 35000 # The number of steps it takes to go from initial to random

count = 0 # Keeps track of the number of steps we've run
experience_buffer = [] # Stores the agent's experiences in a list
buffer_size = 10000 # How large the experience buffer can be
train_step = 256 # How often to train the model
batches_per_train = 10
save_step = 500 # How often to save the trained model
batch_size = 256 # How many experiences to train on at once
env_size = 500 # How many pixels tall and wide the environment should be.
load_model = True # Whether or not to load a pretrained model
train = True # Whether or not to train the model
test = False # Whether or not to test the model

tf.reset_default_graph()

sess = tf.InteractiveSession()

model = AC_Net([None, 201, 201, 3], 5, trainer)
env = environment.Env(env_size)
action = [0, 0]
state, _ = env.step(True, action)

saver = tf.train.Saver() # This saves the model
epsilon = first_epsilon
tf.global_variables_initializer().run()

if load_model:
    ckpt = tf.train.get_checkpoint_state(model_path)
    saver.restore(sess, ckpt.model_checkpoint_path) 
    print 'Model loaded'

prev_out = None

while count <= total_steps and train:

    if random.random() < epsilon or count == 0:
        if prev_out is not None:
            out = prev_out
        if random.randint(0, 100) == 100 or prev_out is None:
            out = np.random.rand(5)
            out = np.array([val/np.sum(out) for val in out])
            _, value = model.predict(state, sess)
            prev_out = out

    else:
        out, value = model.predict(state, sess)
        out = out[0]
    act = np.random.choice(out, p=out)
    act = np.argmax(out == act)
    act1 = process_outputs(act)
    action[act1[0]] = act1[1]
    _, reward = env.step(True, action)
    new_state = env.get_state()

    experience_buffer.append((state, act, reward, new_state, value[0, 0]))

    state = new_state

    if len(experience_buffer) > buffer_size:
        experience_buffer.pop(0)

    if count % train_step == 0 and count > 0:
        print "Training model"
        for i in range(batches_per_train):
        # Get a random sample of experiences and train the model based on it.
            x = random.randint(0, len(experience_buffer)-batch_size)
            minibatch = np.array(experience_buffer[x:x+batch_size])
            loss, _ = model.train(minibatch, gamma, sess)
            print "Loss for batch", str(i+1) + ":", loss


    if count % save_step == 0 and count > 0:
        saver.save(sess, model_path+'/model-'+str(count)+'.ckpt')
        print "Model saved"

    if count % ep_length == 0 and count > 0:
        print "Starting new episode"
        env = environment.Env(env_size)

    if epsilon > final_epsilon:
        epsilon -= (first_epsilon - final_epsilon)/anneal_steps

    count += 1

while count <= total_steps and test:
    out, _ = model.predict(state, sess)
    out = out[0]
    act = np.random.choice(out, p=out)
    act = np.argmax(out == act)
    act1 = process_outputs(act)
    action[act1[0]] = act1[1]
    state, reward = env.step(True, action)
    new_state = env.get_state()
    count += 1

# Write log files to create tensorboard visualizations
merged = tf.summary.merge_all()
writer = tf.summary.FileWriter('/home/perrin/neural/summaries', sess.graph)
if train:
    summary = sess.run(merged, feed_dict=model.feed_dict)
    writer.add_summary(summary)
writer.flush()

【问题讨论】：

您的内存不足，您可以尝试使用较小的批处理大小吗？
@YaroslavBulatov 感谢您的建议。我尝试了批量大小为 10，但仍然出现所有错误。
批量大小 1 怎么样？如果内存不足，你需要让你的网络更小，或者使用内存更大的机器
@YaroslavBulatov 批量大小为 1 时也会发生同样的情况。因为它不会立即耗尽内存，所以我认为它在训练时会以某种方式填满内存。除了使用较小的网络或获得更多内存之外，还有什么方法可以处理此类问题？
理论上，运行调用之间的内存不应该增长。在实践中，我发现如果修改张量大小，内存会增长。 IE，如果张量的大小都相同，它只会重用它在之前的运行调用中为这些大小预先分配的内存。此外，我运行了批量大小为 2000 且适合 TitanX 内存的 A3C。如果您提供可重现的示例，我可以对其进行分析并查看 RAM 的去向。

标签： python-2.7 tensorflow out-of-memory deep-learning underflow

【解决方案1】：

您的内存不足。您的网络可能需要比运行所需更多的内存，因此跟踪过度内存使用的第一步是找出使用这么多内存的原因。

这是一种使用时间线和 statssummarizer 的方法： https://gist.github.com/yaroslavvb/08afccbe087171881ceafc0c98abca05

这将打印出几张表，其中一张是按最高内存使用量排序的张量。你应该检查你那里没有异常大的东西。

您还可以使用 Chrome 可视化工具查看内存时间线，详见here

一种更高级的技术是绘制内存分配/解除分配的时间线，如 issue 中所做的那样

理论上，如果您不创建新的有状态操作（变量），则您的内存使用量不应在步骤之间增加，但我发现如果您的张量大小在步骤之间发生变化，全局内存分配可能会增加。

解决方法是定期将参数保存到检查点并重新启动脚本。

【讨论】：