Tensorflow LinearRegressor 特征不能有等级 0答案

【问题标题】：Tensorflow LinearRegressor Feature Cannot have rank 0Tensorflow LinearRegressor 特征不能有等级 0
【发布时间】：2019-12-11 18:36:54
【问题描述】：

我正在按照教程进行操作，但未能为在 y=x 之上生成的数据集构建线性回归器。这是我的代码的最后一部分，如果你想重现我的错误，你可以在这里找到complete source code：

_CSV_COLUMN_DEFAULTS = [[0],[0]]
_CSV_COLUMNS = ['x', 'y']

def input_fn(data_file):

    def parse_csv(value):
        print('Parsing', data_file)
        columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
        features = dict(zip(_CSV_COLUMNS, columns))
        labels = features.pop('y')
        return features, labels

    # Extract lines from input files using the Dataset API.
    dataset = tf.data.TextLineDataset(data_file)
    dataset = dataset.map(parse_csv)

    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels

x = tf.feature_column.numeric_column('x')
base_columns = [x]

model_dir = tempfile.mkdtemp()
model = tf.estimator.LinearRegressor(model_dir=model_dir,     feature_columns=base_columns)

model = model.train(input_fn=lambda: input_fn(data_file=file_path))

这段代码会以某种方式失败并显示错误消息

ValueError: Feature (key: x) cannot have rank 0. Give: Tensor("IteratorGetNext:0", shape=(), dtype=int32, device=/device:CPU:0)

由于 tensorflow 的性质，我发现根据给定的消息检查它真正出错的地方有点困难。

【问题讨论】：

我认为估计器会在训练期间自行循环数据集，因此在input_fn 中，您只需返回dataset 而不是创建迭代器

标签： python tensorflow

【解决方案1】：

据我所知，值的第一个维度是batch_size。所以input_fn返回数据时，应该是批量返回数据。

一旦您将数据作为批处理返回，它就可以工作，例如：

dataset = tf.data.TextLineDataset(data_file)
dataset = dataset.map(parse_csv)
dataset = dataset.batch(10) # or any other batch size

【讨论】：

我整天都在使用 kmeansclustering 估计器遇到问题，不知道发生了什么。这救了我。谢谢！

【解决方案2】：

功能不能有等级 0 问题发生在我们不使用 input_fn 或 eval_fn 指定 batch_size 或使用 estimator api 指定 predict_fn，下面的代码将显示张量的形状如何随 batch_size 变化。此代码将与 TF2.0 一起使用，以便在早期版本上运行此代码启用即时执行 (tf.enable_eager_execution())。在下面的两个代码段中，请注意输出张量的形状如何随 batch_size 和没有 batch_size 变化。

     ##### content of test.csv ####
        feature1, feature2,label
        234, 235, 24
        345, 345,26
        234, 345, 28
        432, 567, 29
    ########################
    import tensorflow as tf 
    tf.enable_eager_execution()

    CSV_COLUMNS= ['feature1','feature2','label']
    CSV_COLUMN_DEFAULTS = [[0.0], [0.0],[0.0]]
    def parse_csv(value):        
        columns = tf.decode_csv(value, record_defaults=CSV_COLUMN_DEFAULTS)
        features = dict(zip(CSV_COLUMNS, columns))
        labels = features.pop('label')
        return features, labels
###Without batch size 
    dataset = tf.data.TextLineDataset(filenames='./test.csv').skip(count = 1)
    dataset = dataset.map(parse_csv)
    for i in dataset:
        print(i)

# Output tensor shape here is shape=()

    ({'feature1': <tf.Tensor: id=247, shape=(), dtype=float32, numpy=234.0>, 'feature2': <tf.Tensor: id=248, shape=(), dtype=float32, numpy=235.0>}, <tf.Tensor: id=249, shape=(), dtype=float32, numpy=24.0>)
    ({'feature1': <tf.Tensor: id=253, shape=(), dtype=float32, numpy=345.0>, 'feature2': <tf.Tensor: id=254, shape=(), dtype=float32, numpy=345.0>}, <tf.Tensor: id=255, shape=(), dtype=float32, numpy=26.0>)
    ({'feature1': <tf.Tensor: id=259, shape=(), dtype=float32, numpy=234.0>, 'feature2': <tf.Tensor: id=260, shape=(), dtype=float32, numpy=345.0>}, <tf.Tensor: id=261, shape=(), dtype=float32, numpy=28.0>)
    ({'feature1': <tf.Tensor: id=265, shape=(), dtype=float32, numpy=432.0>, 'feature2': <tf.Tensor: id=266, shape=(), dtype=float32, numpy=567.0>}, <tf.Tensor: id=267, shape=(), dtype=float32, numpy=29.0>)

###With batch size

    dataset = tf.data.TextLineDataset(filenames='./test.csv').skip(count = 1)
    dataset = dataset.map(parse_csv).batch(batch_size=2)
    for i in dataset:
        print(i)
# Output tensor shape here is shape=(2,)

    ({'feature1': <tf.Tensor: id=442, shape=(2,), dtype=float32, numpy=array([234., 345.], dtype=float32)>, 'feature2': <tf.Tensor: id=443, shape=(2,), dtype=float32, numpy=array([235., 345.], dtype=float32)>}, <tf.Tensor: id=444, shape=(2,), dtype=float32, numpy=array([24., 26.], dtype=float32)>)
    ({'feature1': <tf.Tensor: id=448, shape=(2,), dtype=float32, numpy=array([234., 432.], dtype=float32)>, 'feature2': <tf.Tensor: id=449, shape=(2,), dtype=float32, numpy=array([345., 567.], dtype=float32)>}, <tf.Tensor: id=450, shape=(2,), dtype=float32, numpy=array([28., 29.], dtype=float32)>)


Using batch_size with dataset would solve the issue "Feature cannot have rank 0".

【讨论】：