【问题标题】:Text Classification with RNN in TensorFlow - AttributeError: '_IndicatorColumn' object has no attribute 'key'TensorFlow中使用RNN进行文本分类-AttributeError:'_IndicatorColumn'对象没有属性'key'
【发布时间】:2018-03-26 18:56:40
【问题描述】:

我正在尝试制作一个程序,该程序将使用 TensorFlow 中的 DynamicRnnEstimator 将文本块分类为类。不幸的是,我在运行代码时收到此错误:

AttributeError: '_IndicatorColumn' 对象没有属性'key'

我的 data.csv 文件如下所示:

这是我目前在 Python 3 中运行的代码:

import tensorflow as tf
import pandas as pd
from sklearn import preprocessing
from gensim import corpora
from nltk.tokenize import WhitespaceTokenizer
import pandas as pd
import string
from tensorflow.contrib.learn.python.learn.estimators import constants
from tensorflow.contrib.learn.python.learn.estimators import rnn_common


data_df = pd.read_csv('data.csv', encoding='ISO-8859-1').astype('U') #data.csv has 2 columns: "Category", and "Description"

raw_descriptions = data_df['Description']

## Calculate vocab size
descriptions = []
for description in raw_descriptions:
    descriptions.append(WhitespaceTokenizer().tokenize(description))

dictionary = corpora.Dictionary(descriptions)
unique_words = len(dictionary.token2id) #how many unique words do we see? use for hash_bucket_size

## Set up Features and Labels
features = raw_descriptions.to_frame() #pandas_input_func needs features in DataFrame format
lab_enc = preprocessing.LabelEncoder()
labels = lab_enc.fit_transform(data_df['Category'])
labels = pd.Series(labels) #pandas_input_func needs the labels in Series format

## Train/Test Split
split = int(.3*len(data_df.index)) #we'll use 30% of our data for testing, 70% for training
features_train = features[:-split]
features_test = features[-split:]
labels_train = labels[:-split]
labels_test = labels[-split:]


n_classes = len(lab_enc.classes_) #how many unique lables do we have?

categorical_column = tf.feature_column.categorical_column_with_hash_bucket('Description', hash_bucket_size=unique_words)
description = tf.feature_column.indicator_column(categorical_column)
feat_cols = [description]

input_func = tf.estimator.inputs.pandas_input_fn(
    x=features_train, 
    y=labels_train, 
    batch_size=100, 
    num_epochs=None, 
    shuffle=False)


classifier = tf.contrib.learn.DynamicRnnEstimator(
    problem_type = constants.ProblemType.CLASSIFICATION,
    prediction_type = rnn_common.PredictionType.SINGLE_VALUE,
    sequence_feature_columns = feat_cols,
    context_feature_columns = None,
    num_units = 5,
    num_classes = n_classes,
    cell_type = 'lstm', 
    optimizer = 'SGD',
    learning_rate = 0.1,
    predict_probabilities = True)

classifier.fit(input_fn=input_func)

错误发生在最后一行,在 classifier.fit() 处。我不太确定如何解决这个问题。我假设由于正在调用属性“键”,因此需要将某些内容格式化为字典,但我不确定是什么或为什么。

非常感谢任何见解!

【问题讨论】:

    标签: python tensorflow machine-learning rnn


    【解决方案1】:

    看起来您的功能是在 pandas df 而不是字典中设置的。 例如,在以下代码中,您可以看到预期的行为是如何在 TensorFlow 中使用 panda 的 DataFrame:

    # Convert pandas data into a dict of np arrays.
    features = {key:np.array(value) for key,value in dict(features).items()}          
    

    试一试,我相信它会解决您的问题。 在您提供的错误中,您可以看到内部代码正在尝试访问“关键”属性(用于字典,但不在 DataFrames 中)。

    亚历克斯

    【讨论】:

    • 您好。谢谢回复。如果我插入这行代码,我是否需要使用 tf.estimator.inputs.numpy_input_fn 而不是 tf.estimator.inputs.pandas_input_fn,因为我正在将我的特征格式从 Dataframe 更改为 Numpy 数组?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-26
    • 2019-04-12
    • 2018-09-07
    • 1970-01-01
    • 2018-01-15
    • 1970-01-01
    相关资源
    最近更新 更多