DNNRegressor 训练输入 fn 多个标签答案

【问题标题】：DNNRegressor training input fn multiple labelsDNNRegressor 训练输入 fn 多个标签
【发布时间】：2018-02-05 18:40:08
【问题描述】：

我正在尝试实现一个 TensorFlow DNNRegressor，它使用具有多个标签的张量，但它不断失败，并出现我不理解的错误。我在 Tensorflow 1.4.1 上完成了 95% 的测试，我刚刚切换到 1.5.0 /CUDA 9，但它仍然失败（你知道，我只是希望 :)）

作为参考，我使用了 boston 示例和 pandas input func 源代码 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/input_fn/boston.py https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/python/estimator/inputs/pandas_io.py

在以下要点中，您可以找到完整的 Python 代码、生成的输出、训练数据和（当前未使用的）测试数据。训练数据和测试数据都很小，只是构建代码而已。 https://gist.github.com/anonymous/c3e9fbe5f5faf373fa230909347318cd

报错信息如下（堆栈跟踪在gist中，为了避免污染帖子我没有贴在这里）

tensorflow.python.framework.errors_impl.InvalidArgumentError：断言失败：[labels shape must be [batch_size, 20]] [Condition x == y did not hold element-wise:] [x (dnn/head/labels/ assert_equal/x:0) = ] [20] [y (dnn/head/labels/strided_slice:0) = ] [3] [[节点：dnn/head/labels/assert_equal/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarise=3, _device="/job:localhost/replica:0/任务：0/设备：CPU：0"]（dnn/head/labels/assert_equal/All/_151，dnn/head/labels/assert_equal/Assert/Assert/data_0，dnn/head/labels/assert_equal/Assert/Assert/ data_1, dnn/head/labels/assert_equal/Assert/Assert/data_2, dnn/head/logits/assert_equal/x/_153, dnn/head/labels/assert_equal/Assert/Assert/data_4, dnn/head/labels/strided_slice/ _155)]]

input_fn如下

def get_input_fn(dataset,
                 model_labels=None,
                 batch_size=128,
                 num_epochs=1,
                 shuffle=None,
                 queue_capacity=1000,
                 num_threads=1):

    dataset = dataset.copy()

    if queue_capacity is None:
        if shuffle:
            queue_capacity = 4 * len(dataset)
        else:
            queue_capacity = len(dataset)

    min_after_dequeue = max(queue_capacity / 4, 1)

    def input_fn():
        queue = feeding_functions._enqueue_data(
            dataset,
            queue_capacity,
            shuffle=shuffle,
            min_after_dequeue=min_after_dequeue,
            num_threads=num_threads,
            enqueue_size=batch_size,
            num_epochs=num_epochs)

        if num_epochs is None:
            features = queue.dequeue_many(batch_size)
        else:
            features = queue.dequeue_up_to(batch_size)

        assert len(features) == len(dataset.columns) + 1, ('Features should have one '
                                                     'extra element for the index.')

        features = features[1:]
        features = dict(zip(list(dataset.columns), features))

        if model_labels is not None:
            #labels = tf.stack([features.pop(model_label) for model_label in model_labels], 0);
            labels = [features.pop(model_label) for model_label in model_labels]

            return features, labels

        return features

    return input_fn

我能够使用以下输入 fn 进行训练和预测，但看起来不适合处理我以后要用于训练的数据量。此外，当我将它与评估方法一起使用时，它会卡住。

def get_input_fn(dataset,
                 model_labels=None):

    def input_fn():
        features = {k: tf.constant(len(dataset), shape=[dataset[k].size, 1]) for k in model_features}

        if model_labels is not None:
            labels_data = []
            for i in range(0, len(dataset)):
                    temp = []
                    for label in model_labels:
                            temp.append(dataset[label].values[i])
                    labels_data.append(temp)
            labels = tf.constant(labels_data, shape=[len(dataset), len(model_labels)])

            return features, labels
        else:
            return features

    return input_fn

谢谢！

注意事项： 如果您检查 gist 中的完整代码，您会注意到特征和标签的数量取决于类别的数量，它是从种子数据动态构建的。也许我可以改用 RNN 并将每个 epoch 映射到一个类别，而不是构建那个巨大的矩阵，但目前我专注于让这个测试正常工作。

【问题讨论】：

标签： python tensorflow

【解决方案1】：

最后我稍微改变了我的生成方法，测试代码已经拆分在prepare.py和train.py中，prepare.py将数据写入一些CSV（输入数据和类别）并在train中.py 我将输入 fn 替换为加载这些 csv、构建数据集、使用 tf.read_csv 解析数据集行的输入 fn（加上一些额外的东西）。

csv_field_defaults = [[0]] * (1 + len(model_features) + len(model_labels))

def _parse_line(line):
    fields = tf.decode_csv(line, csv_field_defaults)

    # Remove the user id
    fields.pop(0)

    features = dict(zip(model_features + model_labels,fields))
    labels = tf.stack([features.pop(model_label) for model_label in model_labels])

    return features, labels

def csv_input_fn(csv_path, batch_size):
    dataset = tf.data.TextLineDataset(csv_path).skip(1)
    dataset = dataset.map(_parse_line)
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    return dataset.make_one_shot_iterator().get_next()

# Initialize tensor flow
tf.logging.set_verbosity(tf.logging.INFO)

# Initialize the neural network
feature_cols = [tf.feature_column.numeric_column(k) for k in model_features]
regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
                                      label_dimension=len(model_labels),
                                      hidden_units=[4096, 2048, 1024, 512],
                                      model_dir="tf_model")

我目前能够处理 10000 条记录，但我需要解析更多数据，希望这个实现表现更好

csv_input_fn 来自 tensorflow 示例，而我修改了 _parse_line 以根据需要处理特征和标签。

【讨论】：