TensorFlow 保存模型：GraphDef 不能大于 2GB答案

【问题标题】：Tensorflow save model: GraphDef cannot be larger than 2GBTensorFlow 保存模型：GraphDef 不能大于 2GB
【发布时间】：2016-12-15 22:49:31
【问题描述】：

我收到以下错误 - 显然是在保存我的模型时

Step = 1799  |  Tensorflow Accuracy = 1.0
Step = 1799  |  My Accuracy = 0.0363355780022
Step = 1800  |  Tensorflow Accuracy = 1.0
Step = 1800  |  My Accuracy = 0.0364694929089
Traceback (most recent call last):
  File "CNN-LSTM-seg-reg-sigmoid.py", line 290, in <module>
    saver.save(sess, save_path)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1085, in save
    self.export_meta_graph(meta_graph_filename)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1103, in export_meta_graph
    add_shapes=True),
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2175, in as_graph_def
    result, _ = self._as_graph_def(from_version, add_shapes)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2138, in _as_graph_def
    raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.

Here 建议注意tf.constants，但我的程序中的常量为零。但是，我的weights 和biases 如下：tf.Variable(tf.random_normal([32]),name="bc1")。这可能是个问题吗？

如果不是这样，那么this 会告诉我，在每次循环迭代后我都在某个地方添加到图表中，但我不确定它发生在哪里。

我的第一个猜测是我做出预测的时候。我通过以下代码...

# Make prediction
im = Image.open('/home/volcart/Documents/Data/input_crops/temp data0001.tif')
batch_x = np.array(im)
batch_x = batch_x.reshape((1, n_input_x, n_input_y))
batch_x = batch_x.astype(float)
prediction = sess.run(pred, feed_dict={x: batch_x})
prediction = tf.sigmoid(prediction.reshape((n_input_x * n_input_y, n_classes)))
prediction = prediction.eval().reshape((n_input_x, n_input_y, n_classes))

我的第二个猜测是当我通过以下方式计算 loss 和 accuracy 时：loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x, y: batch_y})

我的整个会话代码如下所示：

# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()

gpu_options = tf.GPUOptions()
config = tf.ConfigProto(gpu_options=gpu_options)
config.gpu_options.allow_growth = True

# Launch the graph
with tf.Session(config=config) as sess:
    sess.run(init)
    summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph) #initialize graph for tensorboard
    step = 1
    # Import data
    data = scroll_data.read_data('/home/volcart/Documents/Data/')
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = data.train.next_batch(batch_size)
        # Run optimization op (backprop)
        batch_x = batch_x.reshape((batch_size, n_input_x, n_input_y))
        batch_y = batch_y.reshape((batch_size, n_input_x, n_input_y))
        batch_y = convert_to_2_channel(batch_y, batch_size)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        step = step + 1

        loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                          y: batch_y})


        # Make prediction
        im = Image.open('/home/volcart/Documents/Data/input_crops/temp data0001.tif')
        batch_x = np.array(im)
        batch_x = batch_x.reshape((1, n_input_x, n_input_y))
        batch_x = batch_x.astype(float)
        prediction = sess.run(pred, feed_dict={x: batch_x})
        prediction = tf.sigmoid(prediction.reshape((n_input_x * n_input_y, n_classes)))
        prediction = prediction.eval().reshape((n_input_x, n_input_y, n_classes))

        # Temp arrays are to splice the prediction n_input_x x n_input_y x 2
            # into 2 matrices n_input_x x n_input_y
        temp_arr1 = np.empty((n_input_x, n_input_y))
        for i in xrange(n_input_x):
            for j in xrange(n_input_x):
                for k in xrange(n_classes):
                    if k == 0:
                        temp_arr1[i][j] = 1 - prediction[i][j][k]

        my_acc = accuracy_custom(temp_arr1,batch_y[0,:,:,0])

        print "Step = " + str(step) + "  |  Tensorflow Accuracy = " + str(acc)
        print "Step = " + str(step) + "  |  My Accuracy = " + str(my_acc)

        if step % 100 == 0:
            save_path = "/home/volcart/Documents/CNN-LSTM-reg-model/CNN-LSTM-seg-step-" + str(step) + "-model.ckpt"
            saver.save(sess, save_path)
            csv_file = "/home/volcart/Documents/CNN-LSTM-reg/CNNLSTMreg-step-" + str(step) + "-accuracy-" + str(my_acc) + ".csv"
            np.savetxt(csv_file, temp_arr1, delimiter=",")

【问题讨论】：

它会立即崩溃吗？尝试在每一步保存。如果它在几个步骤后崩溃，那么模型有问题。

标签： machine-learning neural-network artificial-intelligence tensorflow

【解决方案1】：

你正在这条线上增长你的图表：

prediction = tf.sigmoid(prediction.reshape((n_input_x * n_input_y, n_classes)))

这会将您的 prediction numpy 数组转换为 TensorFlow 常量节点，将其内联到 Graph 中，并在其上添加 Sigmoid 节点。

您可以通过在开始训练循环之前添加tf.get_default_graph().finalize() 来发现此类问题

【讨论】：

如果我使用tf.get_default_graph().finalize()，则无法添加新节点，例如导致问题的行。我需要在Session 之前创建一个节点并执行sess.run() 吗？使用那个特定的节点？
所有的 Tensorflow 节点都可以（并且应该）在开始会话之前定义以避免像你这样的错误

【解决方案2】：

您可以使用 tf.placeholder 重写以下代码行：

预测 = tf.sigmoid(prediction.reshape((n_input_x * n_input_y, n_classes)))

这将解决问题。

【讨论】：