TF slice_input_producer 不保持张量同步答案

【问题标题】：TF slice_input_producer not keeping tensors in syncTF slice_input_producer 不保持张量同步
【发布时间】：2017-09-19 22:08:43
【问题描述】：

我正在将图像读入我的 TF 网络，但我还需要相关的标签。

所以我尝试关注this answer，但输出的标签实际上与我在每批中获得的图像并不匹配。

我的图片名称格式为dir/3.jpg，所以我只是从图片文件名中提取标签。

truth_filenames_np = ...
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)

# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]

labels_tf = tf.convert_to_tensor(labels)

# *** This line should make sure both input tensors are synced (from my limited understanding)
# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer([truth_filenames_tf, labels_tf], shuffle=False)


truth_image_value = tf.read_file(truth_image_name)
truth_image = tf.image.decode_jpeg(truth_image_value)
truth_image.set_shape([IMAGE_DIM, IMAGE_DIM, 3])
truth_image = tf.cast(truth_image, tf.float32)
truth_image = truth_image/255.0

# Another key step, where I batch them together
truth_images_batch, truth_label_batch = tf.train.batch([truth_image, truth_label], batch_size=mb_size)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(epochs):
        print "Epoch ", i
        X_truth_batch = truth_images_batch.eval()
        X_label_batch = truth_label_batch.eval()

        # Here I display all the images in this batch, and then I check which file numbers they actually are. 
        # BUT, the images that are displayed don't correspond with what is printed by X_label_batch!
        print X_label_batch
        plot_batch(X_truth_batch)



    coord.request_stop()
    coord.join(threads)

是我做错了什么，还是 slice_input_producer 实际上并没有确保其输入张量是同步的？

旁白：

我还注意到，当我从 tf.train.batch 得到一个批次时，批次中的元素在我给它的原始列表中是相邻的，但是批次顺序不是原始顺序。示例：如果我的数据是 ["dir/1.jpg", "dir/2.jpg", "dir/3.jpg", "dir/4.jpg", "dir/5.jpg, "dir/6 .jpg"]，然后我可能会得到批处理 (with batch_size=2) ["dir/3.jpg", "dir/4.jpg"]，然后是批处理 ["dir/1.jpg", "dir/2 .jpg"]，然后是最后一个。因此，这使得即使只对标签使用 FIFO 队列也很困难，因为订单与批次订单不匹配。

【问题讨论】：

您能否将代码编辑到最低限度以重现问题？如，删除所有图像处理并查看图像/标签是否打乱 - 因为除非我们有文件，否则我们无法运行此代码

标签： python machine-learning tensorflow

【解决方案1】：

这是一个重现问题的完整可运行示例：

import tensorflow as tf

truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)

# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer(
    [truth_filenames_tf, labels_tf], shuffle=False)

# # Another key step, where I batch them together
# truth_images_batch, truth_label_batch = tf.train.batch(
#     [truth_image_name, truth_label], batch_size=11)

epochs = 7

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    for i in range(epochs):
        print("Epoch ", i)
        X_truth_batch = truth_image_name.eval()
        X_label_batch = truth_label.eval()
        # Here I display all the images in this batch, and then I check
        # which file numbers they actually are.
        # BUT, the images that are displayed don't correspond with what is
        # printed by X_label_batch!
        print(X_truth_batch)
        print(X_label_batch)
    coord.request_stop()
    coord.join(threads)

打印出来的是：

Epoch  0
b'dir/0.jpg'
b'1.jpg'
Epoch  1
b'dir/2.jpg'
b'3.jpg'
Epoch  2
b'dir/4.jpg'
b'5.jpg'
Epoch  3
b'dir/6.jpg'
b'7.jpg'
Epoch  4
b'dir/8.jpg'
b'9.jpg'
Epoch  5
b'dir/10.jpg'
b'11.jpg'
Epoch  6
b'dir/12.jpg'
b'13.jpg'

所以基本上每个 eval 调用都会再次运行该操作！添加批处理对此没有影响 - 只是打印批处理（前 11 个文件名，然后是接下来的 11 个标签，依此类推）

我看到的解决方法是：

for i in range(epochs):
    print("Epoch ", i)
    pair = tf.convert_to_tensor([truth_image_name, truth_label]).eval()
    print(pair[0])
    print(pair[1])

正确打印：

Epoch  0
b'dir/0.jpg'
b'0.jpg'
Epoch  1
b'dir/1.jpg'
b'1.jpg'
# ...

但对于违反最小意外原则的行为不做任何事情。

编辑：另一种方法：

import tensorflow as tf

truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
truth_image_name, truth_label = tf.train.slice_input_producer(
    [truth_filenames_tf, labels_tf], shuffle=False)
epochs = 7
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    tf.train.start_queue_runners(sess=sess)
    for i in range(epochs):
        print("Epoch ", i)
        X_truth_batch, X_label_batch = sess.run(
            [truth_image_name, truth_label])
        print(X_truth_batch)
        print(X_label_batch)

这是一种更好的方法，因为 tf.convert_to_tensor 和 co 只接受相同类型/形状等的张量。

请注意，为简单起见，我删除了协调器，但这会导致警告：

W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\kernels\queue_base.cc:294] _0_input_producer/input_producer/fraction_of_32_full/fraction_of_32_full：跳过取消的入队尝试队列未关闭

见this

【讨论】：