使用 Tensorflow 对非图像数据集进行批处理答案

【问题标题】：Batching for a non-image data set with Tensorflow使用 Tensorflow 对非图像数据集进行批处理
【发布时间】：2016-08-16 03:12:40
【问题描述】：

我是张量流的初学者。我有一个包含 43 个输入和一个输出的数据集。我将创建一个小批量数据来运行深度学习。

这是我的输入：

x = tf.placeholder(tf.float32, shape=[None, 43])
y_ = tf.placeholder(tf.float32, shape=[None])

我从一个 matlab 文件中给它们喂食：

train_mat = train_mat["binary_train"].value
feed_dict={x:Train[0:100,0:43] , y_:Train[0:100,43]}

我将使用随机批次而不是调用 0:100 记录。我看到了

tf.train.batch

但是，我不知道它是如何工作的。请您指导我如何做到这一点。

谢谢，阿夫辛

【问题讨论】：

标签： python random tensorflow batching

【解决方案1】：

tf.train.batch 和其他类似的方法是基于队列的，最适合并行异步加载大量样本。文档here 描述了在 TensorFlow 中使用队列的基本知识。还有另一个博客描述how to read data from files。

如果您要使用队列，则不需要placeholder 和feed_dict。

对于您的具体情况，潜在的解决方案可能如下所示：

from tensorflow.python.training import queue_runner

# capacity and min_after_dequeue could be set according to your case
q = tf.RandomShuffleQueue(1000, 500, tf.float32)
enq = q.enqueue_many(train_mat)
queue_runner.add_queue_runner(queue_runner.QueueRunner(q, [enq]))

deq = q.dequeue()
input = deq[:, 0:43]
label = deq[:, 43]

x, y_ = tf.train.batch([input, label], 100)

# then you can use x and y_ directly in inference and train process.

上面的代码基于一些假设，因为提供的信息不充分。不过，我希望这些代码能以某种方式启发您。

【讨论】：

它给出了错误：AttributeError：'module'对象没有属性'train'。一个问题，为什么在RandomShuffleQueue中选择1000和500？
抱歉打错了。它应该是 tf.nn.batch 而不是 tf.nn.train。
Agian 说 AttributeError: 'module' object has no attribute 'batch'。
我的错。它是tf.train.batch。因为代码不完整，所以没有在真实环境中尝试过。但我希望你能从 sn-p 中得到一些启发。