【问题标题】:How to define an embedding column in tensorflow 2.0?如何在 tensorflow 2.0 中定义嵌入列?
【发布时间】:2020-05-20 18:08:11
【问题描述】:

我是 Tensorflow 的新手,我正在使用本地驱动器 https://www.tensorflow.org/tutorials/structured_data/feature_columns 中的 csv 数据学习本教程,我可以加载 csv 文件并打印列标题

for feature_batch, label_batch in train_ds.take(1):
  print('Every feature:', list(feature_batch.keys()))
  print('A batch of traffic_type',label_batch)

当我尝试使用

创建嵌入特征列时
_mt_datetime_embedding = feature_column.embedding_column(_mt_datetime, dimension=8)
demo(_mt_datetime_embedding)

出现了这个错误

AttributeError:“EmbeddingColumn”对象没有属性“num_buckets”。 我不知道出了什么问题?有人可以帮我吗?非常感谢。

【问题讨论】:

    标签: csv dataset tensorflow2.0


    【解决方案1】:

    根据 Tensorflow 关于嵌入列的文档:

    假设不是只有几个可能的字符串,我们有 每个类别的数千个(或更多)值。出于多种原因,如 类别的数量越来越大,训练一个 使用 one-hot 编码的神经网络。我们可以使用嵌入列 来克服这个限制。而不是将数据表示为 多维的 one-hot 向量,一个嵌入列表示 数据作为一个低维的密集向量,其中每个单元格都可以 包含任何数字,而不仅仅是 0 或 1。

    当分类列有许多可能的值时,最好使用embedding column

    tf.feature_column.embedding_column 的输入必须是由任何 categorical_column_* function 创建的 CategoricalColumn

    语法:

    tf.feature_column.embedding_column(
        categorical_column, dimension, combiner='mean', initializer=None,
        ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True,
        use_safe_embedding_lookup=True
    )
    

    当我将输入添加为numeric_column 而不是categorical_column 然后收到AttributeError: 'NumericColumn' object has no attribute 'num_buckets'

    age_embedding = feature_column.embedding_column(age, dimension=8)
    demo(age_embedding)
    

    输出:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-23-94a5fc74016e> in <module>()
          1 age_embedding = feature_column.embedding_column(age, dimension=8)
    ----> 2 demo(age_embedding)
    
    4 frames
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/feature_column/feature_column_v2.py in create_state(self, state_manager)
       3181     """Creates the embedding lookup variable."""
       3182     default_num_buckets = (self.categorical_column.num_buckets
    -> 3183                            if self._is_v2_column
       3184                            else self.categorical_column._num_buckets)   # pylint: disable=protected-access
       3185     num_buckets = getattr(self.categorical_column, 'num_buckets',
    
    AttributeError: 'NumericColumn' object has no attribute 'num_buckets'
    

    当我将输入添加为categorical_column 时,它将它们转换为密集表示。这是完整的代码。

    import numpy as np
    import pandas as pd
    
    import tensorflow as tf
    
    from tensorflow import feature_column
    from tensorflow.keras import layers
    from sklearn.model_selection import train_test_split
    
    URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
    dataframe = pd.read_csv(URL)
    
    train, test = train_test_split(dataframe, test_size=0.2)
    train, val = train_test_split(train, test_size=0.2)
    print(len(train), 'train examples')
    print(len(val), 'validation examples')
    print(len(test), 'test examples')
    
    def df_to_dataset(dataframe, shuffle=True, batch_size=32):
      dataframe = dataframe.copy()
      labels = dataframe.pop('target')
      ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
      if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
      ds = ds.batch(batch_size)
      return ds
    
    batch_size = 5 # A small batch sized is used for demonstration purposes
    train_ds = df_to_dataset(train, batch_size=batch_size)
    val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
    test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
    
    example_batch = next(iter(train_ds))[0]
    
    def demo(feature_column):
      feature_layer = layers.DenseFeatures(feature_column)
      print(feature_layer(example_batch).numpy())
    
    age = feature_column.numeric_column("age")
    
    thal = feature_column.categorical_column_with_vocabulary_list(
          'thal', ['fixed', 'normal', 'reversible'])
    
    thal_embedding = feature_column.embedding_column(thal, dimension=8)
    demo(thal_embedding)
    

    输出:

    193 train examples
    49 validation examples
    61 test examples
    
    [[-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
      -0.05250816 -0.13339798]
     [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
      -0.05250816 -0.13339798]
     [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
      -0.05250816 -0.13339798]
     [ 0.3212179   0.29932576 -0.44579896 -0.4998746   0.064592    0.16934885
       0.02404759  0.5051637 ]
     [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
      -0.05250816 -0.13339798]]
    

    更多详情请参考here

    【讨论】:

      猜你喜欢
      • 2019-11-22
      • 2020-09-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-02-26
      • 2017-02-16
      • 1970-01-01
      相关资源
      最近更新 更多