【发布时间】:2018-03-26 18:56:40
【问题描述】:
我正在尝试制作一个程序,该程序将使用 TensorFlow 中的 DynamicRnnEstimator 将文本块分类为类。不幸的是,我在运行代码时收到此错误:
AttributeError: '_IndicatorColumn' 对象没有属性'key'
我的 data.csv 文件如下所示:
这是我目前在 Python 3 中运行的代码:
import tensorflow as tf
import pandas as pd
from sklearn import preprocessing
from gensim import corpora
from nltk.tokenize import WhitespaceTokenizer
import pandas as pd
import string
from tensorflow.contrib.learn.python.learn.estimators import constants
from tensorflow.contrib.learn.python.learn.estimators import rnn_common
data_df = pd.read_csv('data.csv', encoding='ISO-8859-1').astype('U') #data.csv has 2 columns: "Category", and "Description"
raw_descriptions = data_df['Description']
## Calculate vocab size
descriptions = []
for description in raw_descriptions:
descriptions.append(WhitespaceTokenizer().tokenize(description))
dictionary = corpora.Dictionary(descriptions)
unique_words = len(dictionary.token2id) #how many unique words do we see? use for hash_bucket_size
## Set up Features and Labels
features = raw_descriptions.to_frame() #pandas_input_func needs features in DataFrame format
lab_enc = preprocessing.LabelEncoder()
labels = lab_enc.fit_transform(data_df['Category'])
labels = pd.Series(labels) #pandas_input_func needs the labels in Series format
## Train/Test Split
split = int(.3*len(data_df.index)) #we'll use 30% of our data for testing, 70% for training
features_train = features[:-split]
features_test = features[-split:]
labels_train = labels[:-split]
labels_test = labels[-split:]
n_classes = len(lab_enc.classes_) #how many unique lables do we have?
categorical_column = tf.feature_column.categorical_column_with_hash_bucket('Description', hash_bucket_size=unique_words)
description = tf.feature_column.indicator_column(categorical_column)
feat_cols = [description]
input_func = tf.estimator.inputs.pandas_input_fn(
x=features_train,
y=labels_train,
batch_size=100,
num_epochs=None,
shuffle=False)
classifier = tf.contrib.learn.DynamicRnnEstimator(
problem_type = constants.ProblemType.CLASSIFICATION,
prediction_type = rnn_common.PredictionType.SINGLE_VALUE,
sequence_feature_columns = feat_cols,
context_feature_columns = None,
num_units = 5,
num_classes = n_classes,
cell_type = 'lstm',
optimizer = 'SGD',
learning_rate = 0.1,
predict_probabilities = True)
classifier.fit(input_fn=input_func)
错误发生在最后一行,在 classifier.fit() 处。我不太确定如何解决这个问题。我假设由于正在调用属性“键”,因此需要将某些内容格式化为字典,但我不确定是什么或为什么。
非常感谢任何见解!
【问题讨论】:
标签: python tensorflow machine-learning rnn