【发布时间】:2020-12-05 04:41:11
【问题描述】:
所以我尝试在一些文本数据上使用 sklearn TFIDF Vectorizer,但我不断收到此错误:
ValueError: empty vocabulary; perhaps the documents only contain stop words
代码如下:
tf_idf_vect = tfi(stop_words = 'english',
max_features = 20)
x = data['text']
#data = [tweets.strip() for tweets in x]
#texts = [[word.lower() for word in tweet.split()]]
tf_idf = tf_idf_vect.fit_transform([' '.join(tweet) for tweet in x]) # This line is causing the error
tf_idf_norm = normalize(tf_idf)
tf_idf_array = tf_idf_norm.toarray()
vector = pd.DataFrame(tf_idf_array,
columns = tf_idf_vect.get_feature_names())
vector.head()
有什么想法吗?
【问题讨论】:
标签: python scikit-learn nlp tf-idf tfidfvectorizer