训练时出现Tensorflow错误：由操作'shuffle_batch'引起答案

【问题标题】：Tensorflow error when training: Caused by op 'shuffle_batch'训练时出现Tensorflow错误：由操作'shuffle_batch'引起
【发布时间】：2017-09-07 15:50:04
【问题描述】：

我正在尝试从 TFRecord 文件中读取图像和标签，然后使用它们进行训练。我知道我的 TFRecord 文件存在，并且检查过它确实包含 1000 个图像和标签。我的问题似乎只在我想通过管道作为训练的输入时出现。我是 python 和张量流的新手，不知道如何解决问题

在 tf.train.shuffle_batch 出现以下错误

...

由操作“shuffle_batch”引起，定义在：文件“C:/AI/projects/DataGen/train.py”，第 40 行，在 images_batch, labels_batch = tf.train.shuffle_batch([image, label], batch_size=10, capacity=1000,min_after_dequeue=2)

...

这是我的代码，由各种 mnist 示例拼凑而成

import tensorflow as tf


def read_and_decode_single_example(filename):
    # first construct a queue containing a list of filenames.
    # this lets a user split up there dataset in multiple files to keep
    # size down
    filename_queue = tf.train.string_input_producer([filename],
                                                num_epochs=None)
    # Unlike the TFRecordWriter, the TFRecordReader is symbolic
    reader = tf.TFRecordReader()
    # One can read a single serialized example from a filename
    # serialized_example is a Tensor of type string.
    _, serialized_example = reader.read(filename_queue)
    # The serialized example is converted back to actual values.
    # One needs to describe the format of the objects to be returned

    feature = {'image': tf.FixedLenFeature([], tf.string),
           'label': tf.FixedLenFeature([], tf.int64)}

    features = tf.parse_single_example(serialized_example, features=feature)

    # now return the converted data
    label = tf.cast(features['label'], tf.float32)
    image = tf.decode_raw(features['image'], tf.float32)
    image = tf.reshape(image, [28, 28, 3])
    return label, image


with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())

    # get single examples
    label, image = read_and_decode_single_example("train.tfrecords")

    image = tf.cast(image, tf.float32) / 255.

    # groups examples into batches randomly
    images_batch, labels_batch = tf.train.shuffle_batch([image, label], batch_size=10, capacity=1000, min_after_dequeue=2)

    # The model is:
    #
    # Y = softmax( X * W + b)
    #              X: matrix for rgb images of 28x28 pixels, flattened (there are 100 images in a mini-batch)
    #              W: weight matrix with (28x28x3) lines and 10 columns
    #              b: bias vector with 10 dimensions
    #              +: add with broadcasting: adds the vector to each line of the matrix (numpy)
    #              softmax(matrix) applies softmax on each line
    #              softmax(line) applies an exp to each value then divides by the norm of the resulting line
    #              Y: output matrix with 100 lines and 10 columns

    # input X: 28x28x3 RGB images
    X = images_batch
    # correct answers will go here
    Y_ = labels_batch
    # weights W[28 * 28 * 3, 10]
    W = tf.Variable(tf.zeros([28 * 28 * 3, 10]))
    # biases b[10]
    b = tf.Variable(tf.zeros([10]))

    # flatten the images into a single line of pixels
    # -1 in the shape definition means "the only possible dimension that will preserve the number of elements"
    XX = tf.reshape(X, [-1, 28 * 28 * 3])

    # The model
    Y = tf.nn.softmax(tf.matmul(XX, W) + b)

    # loss function: cross-entropy = - sum( Y_i * log(Yi) )
    #                           Y: the computed output vector
    #                           Y_: the desired output vector

    # cross-entropy
    # log takes the log of each element, * multiplies the tensors element by element
    # reduce_mean will add all the components in the tensor
    # so here we end up with the total cross-entropy for all images in the batch
    cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 100.0  # normalized for batches of 100 images,
    # *10 because  "mean" included an unwanted division by 10

    # accuracy of the trained model, between 0 (worst) and 1 (best)
    correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # training, learning rate = 0.005
    train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cross_entropy)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(100 + 1):
        print(i)
        sess.run(train_step)

    coord.request_stop()

    # Wait for threads to stop
    coord.join(threads)
    sess.close()

【问题讨论】：

标签： python tensorflow

【解决方案1】：

我将初始化移到了 tf.train.start_queue_runners 调用之前，这解决了问题，即在模型设置之后

sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)

【讨论】：