如何使用 TensorFlow 正确管理内存和批量大小答案

【问题标题】：How to properly manage memory and batch size with TensorFlow如何使用 TensorFlow 正确管理内存和批量大小
【发布时间】：2016-12-09 18:58:15
【问题描述】：

我正在使用 TensorFlow 构建一个简单的前馈神经网络，并且我正在使用可变大小的批次。我没有使用 GPU，我有 8GB RAM，并且在 Python 3.5.2 上运行。

我的问题是我有一些批次太大并且正在生成典型的内存不足错误。我明白，这不是问题。但是，如果我将 Keras 与 TF 后端一起使用，我就没有这个问题。我在下面构建了一个示例（具有固定大小的批次）来说明这一点。

我的实现有问题吗？我应该如何处理太大的批次？

TensorFlow 示例（耗尽内存）


import numpy as np
import tensorflow as tf

n_observations = 100000
n_input = 6
batch_size = 20000
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)

n_hidden = 16
n_output = 1

def generatebatch(n_observations, batch_size):
    for batch_i in range(n_observations // batch_size):
        start = batch_i*batch_size
        end = start + batch_size
        batch_xs = X[start:end, :]
        batch_ys = Y[start:end]
        yield batch_xs, batch_ys

with tf.Session() as sess:
    # placeholders for input and target
    net_input = tf.placeholder(tf.float32, [None, n_input])
    y_true = tf.placeholder(tf.float32)

    # Hidden Layer
    W1 = tf.Variable(tf.random_normal([n_input, n_hidden]))
    b1 = tf.Variable(tf.random_normal([n_hidden]))
    net_output1 = tf.nn.relu(tf.matmul(net_input, W1) + b1)

    # Yet another Hidden Layer
    yaW1 = tf.Variable(tf.random_normal([n_hidden, n_hidden]))
    yab1 = tf.Variable(tf.random_normal([n_hidden]))
    yanet_output1 = tf.nn.relu(tf.matmul(net_output1, yaW1) + yab1)

    # Output Layer
    W2 = tf.Variable(tf.random_normal([n_hidden, n_output]))
    b2 = tf.Variable(tf.random_normal([n_output]))
    net_output2 = tf.nn.relu(tf.matmul(yanet_output1, W2) + b2)

    # The loss function
    cost = tf.reduce_mean(tf.pow(y_true - net_output2, 2))

    # Configure the optimizer
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    # Initialize variables
    sess.run(tf.global_variables_initializer())

    n_epochs = 100
    for epoch_i in range(n_epochs):
        batchloss = []
        for batch_xs, batch_ys in generatebatch(n_observations, batch_size):
            _, loss = sess.run(
                [optimizer, cost],
                feed_dict={
                    net_input: batch_xs,
                    y_true: batch_ys
            })
            batchloss.append(loss)
        print(np.mean(batchloss))

Keras 示例（以某种方式处理批量大小）


import numpy as np
from keras.models import Sequential
from keras.layers import Dense
import logging

#just to hide the deprecation warnings
logging.basicConfig(level=logging.CRITICAL)

n_input = 6
n_observations = 100000
n_hidden = 16
n_epochs = 10
batch_size = 35000

# input data
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)

# create and fit Multilayer Perceptron model
model = Sequential()
model.add(Dense(n_hidden, input_dim=n_input, activation='relu'))
model.add(Dense(n_hidden, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, Y, nb_epoch=n_epochs, batch_size=batch_size, verbose=1)

【问题讨论】：

标签： python-3.x memory-management tensorflow keras

【解决方案1】：

您的Y 形状不正确，可能会导致 tensorflow 错误地推断张量的形状（例如 (20000, 20000) 而不是 (20000, 6)），从而消耗大量内存。

Y = np.reshape(Y, [n_observations, 1])

因此您的占位符应该具有相同的形状：

net_input = tf.placeholder(tf.float32, shape=[None, n_input])
y_true = tf.placeholder(tf.float32, shape=[None, 1])

【讨论】：

就是这样，谢谢。您能否详细说明“也许 tensorflow 会进行很多整形和内存泄漏。”？ Y 和 y_true 在 (n_observations,) 之前具有相同的 [错误] 形状。为什么我们必须这样做？
抱歉，再想一想，我又想到了另一个更可能的“可能”：tensorflow 试图从feed_dict 值的形状推断sess.run 处的中间张量的形状。错误的X、Y 和占位符设置可能会导致某些内部张量的形状被推断为 (20000, 20000) 而不是 (20000, 6)，从而消耗大量内存。
好的，这对我来说很有意义。

【解决方案2】：

我认为 Keras 正在覆盖 TensorFlow 中的默认配置选项。您的原生 TensorFlow 代码在 GPU 上以较小的批量大小（例如 10k、15k）运行良好。但是使用默认配置，它会假设您想要 GPU 优势，并且 OOM 问题发生是因为没有足够的 GPU 内存。

当您将默认行为更改为 CPU（如您在问题中指出的那样）时，您的 TensorFlow 示例工作正常。以下是我为此更改的行：

config = tf.ConfigProto(
    log_device_placement=True, allow_soft_placement=True
)
config.gpu_options.allow_growth = True


with tf.Session(config=config) as sess, \
        tf.device('cpu:0'):  # placeholders for input and target

【讨论】：

我已经尝试过了，但我收到了一条消息Device mapping: no known devices。我尝试了其他方法来仅强制 CPU，但我仍然得到 OOM：W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[20000,20000] [[Node: gradients/Pow_grad/Select = Select[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/Pow_grad/Greater, gradients/Pow_grad/Log, gradients/Pow_grad/zeros_like)]]