【发布时间】:2018-02-05 18:40:08
【问题描述】:
我正在尝试实现一个 TensorFlow DNNRegressor,它使用具有多个标签的张量,但它不断失败,并出现我不理解的错误。 我在 Tensorflow 1.4.1 上完成了 95% 的测试,我刚刚切换到 1.5.0 /CUDA 9,但它仍然失败(你知道,我只是希望 :))
作为参考,我使用了 boston 示例和 pandas input func 源代码 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/input_fn/boston.py https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/python/estimator/inputs/pandas_io.py
在以下要点中,您可以找到完整的 Python 代码、生成的输出、训练数据和(当前未使用的)测试数据。训练数据和测试数据都很小,只是构建代码而已。 https://gist.github.com/anonymous/c3e9fbe5f5faf373fa230909347318cd
报错信息如下(堆栈跟踪在gist中,为了避免污染帖子我没有贴在这里)
tensorflow.python.framework.errors_impl.InvalidArgumentError:断言失败:[labels shape must be [batch_size, 20]] [Condition x == y did not hold element-wise:] [x (dnn/head/labels/ assert_equal/x:0) = ] [20] [y (dnn/head/labels/strided_slice:0) = ] [3] [[节点:dnn/head/labels/assert_equal/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarise=3, _device="/job:localhost/replica:0/任务:0/设备:CPU:0"](dnn/head/labels/assert_equal/All/_151,dnn/head/labels/assert_equal/Assert/Assert/data_0,dnn/head/labels/assert_equal/Assert/Assert/ data_1, dnn/head/labels/assert_equal/Assert/Assert/data_2, dnn/head/logits/assert_equal/x/_153, dnn/head/labels/assert_equal/Assert/Assert/data_4, dnn/head/labels/strided_slice/ _155)]]
input_fn如下
def get_input_fn(dataset,
model_labels=None,
batch_size=128,
num_epochs=1,
shuffle=None,
queue_capacity=1000,
num_threads=1):
dataset = dataset.copy()
if queue_capacity is None:
if shuffle:
queue_capacity = 4 * len(dataset)
else:
queue_capacity = len(dataset)
min_after_dequeue = max(queue_capacity / 4, 1)
def input_fn():
queue = feeding_functions._enqueue_data(
dataset,
queue_capacity,
shuffle=shuffle,
min_after_dequeue=min_after_dequeue,
num_threads=num_threads,
enqueue_size=batch_size,
num_epochs=num_epochs)
if num_epochs is None:
features = queue.dequeue_many(batch_size)
else:
features = queue.dequeue_up_to(batch_size)
assert len(features) == len(dataset.columns) + 1, ('Features should have one '
'extra element for the index.')
features = features[1:]
features = dict(zip(list(dataset.columns), features))
if model_labels is not None:
#labels = tf.stack([features.pop(model_label) for model_label in model_labels], 0);
labels = [features.pop(model_label) for model_label in model_labels]
return features, labels
return features
return input_fn
我能够使用以下输入 fn 进行训练和预测,但看起来不适合处理我以后要用于训练的数据量。 此外,当我将它与评估方法一起使用时,它会卡住。
def get_input_fn(dataset,
model_labels=None):
def input_fn():
features = {k: tf.constant(len(dataset), shape=[dataset[k].size, 1]) for k in model_features}
if model_labels is not None:
labels_data = []
for i in range(0, len(dataset)):
temp = []
for label in model_labels:
temp.append(dataset[label].values[i])
labels_data.append(temp)
labels = tf.constant(labels_data, shape=[len(dataset), len(model_labels)])
return features, labels
else:
return features
return input_fn
谢谢!
注意事项: 如果您检查 gist 中的完整代码,您会注意到特征和标签的数量取决于类别的数量,它是从种子数据动态构建的。 也许我可以改用 RNN 并将每个 epoch 映射到一个类别,而不是构建那个巨大的矩阵,但目前我专注于让这个测试正常工作。
【问题讨论】:
标签: python tensorflow