【发布时间】:2016-06-07 06:10:35
【问题描述】:
我正在修改 cifar 多 GPU tensorflow 代码以读取 Imagenet 数据集。
我所做的修改是:
Cifar10.py:
1) 改变了 tf.app.flags.DEFINE_string('data_dir',...)
2) 删除了data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')后面的部分
3) 从 maybe_download_and_extract() 中删除了下载部分
cifar10_input.py:
1) 图像大小 = 227
2) result.height = 256 和 result.width = 256
3) 改变
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) for i in xrange(1, 6)]
到
filenames = [os.path.join(data_dir, i) for i in os.listdir(data_dir)]
但这会引发一个丑陋的错误: tensorflow.python.framework.errors.OutOfRangeError: RandomShuffleQueue '_1_tower_0/shuffle_batch/random_shuffle_queue' 已关闭且元素不足(请求 128,当前大小为 0)
[[节点:tower_0/shuffle_batch = QueueDequeueMany[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/ shuffle_batch/random_shuffle_queue, tower_0/shuffle_batch/n/_775)]]
[[节点:tower_1/shuffle_batch/n/_664 = _HostSendT=DT_INT32,client_terminated=false,recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job: localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_170_tower_1/shuffle_batch/n", _device="/job:localhost/replica:0/task:0/gpu:1"] ] 由 op u'tower_0/shuffle_batch' 引起,定义在:
文件“lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_multi-gpu_train.py”,第 224 行,
tf.app.run()
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py”,第 30 行,运行中
sys.exit(main(sys.argv))
文件“lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_multi-gpu_train.py”,第 222 行,在 main
train()
文件“lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_multi-gpu_train.py”,第 150 行,在训练中
loss = tower_loss(scope)
文件“lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_multi-gpu_train.py”,第 65 行,在 tower_loss 中
images, labels = cifar10.distorted_inputs()
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10.py”,第 119 行,在 distorted_inputs 中
batch_size=FLAGS.batch_size)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_input.py”,第 153 行,在 distorted_inputs 中
min_queue_examples, batch_size)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/models/image/cifar10/cifar10_input.py”,第 104 行,在 _generate_image_and_label_batch 中
min_after_dequeue=min_queue_examples)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/input.py”,第 496 行,在 shuffle_batch return queue.dequeue_many(batch_size, name=name)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py”,第 287 行,在 dequeue_many 中
self._queue_ref, n, self._dtypes, name=name)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py”,第 319 行,在 _queue_dequeue_many 中
timeout_ms=timeout_ms, name=name)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py”,第 664 行,在 apply_op op_def=op_def)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”,第 1834 行,在 create_op 中
original_op=self._default_original_op, op_def=op_def)
文件“/home/saoni.m/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py”,第 1043 行,在 init
self._traceback = _extract_stack()
当我回溯到调用 shuffle_batch() 的那一行时:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
传递给它的值是: 批量大小 128,num_threads 16,容量 20384,min_after_deque 20000
【问题讨论】:
标签: python python-2.7 tensorflow