【发布时间】:2018-10-11 20:44:43
【问题描述】:
我正在尝试为文本分类拟合 SVM 模型,但 x = text_clf_svm.fit(file_name, target_file) 行给出错误。我尝试了各种方法,但都无法解决。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
from io import StringIO
import numpy as np
count_vect = CountVectorizer(stop_words=None, input='file')
file_name = open('./svmtest.txt', 'r').read().splitlines()
target_file = open('./target.txt', 'r').read().splitlines()
file_name = [StringIO(x) for x in file_name]
X_train_counts = count_vect.fit_transform(file_name)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
text_clf_svm = Pipeline([('vect', CountVectorizer(stop_words=None,
input='file')),
('tfidf', TfidfTransformer()),
('clf-svm', SGDClassifier(loss='hinge', penalty='l2',
alpha=1e-3, n_iter=5,
random_state=42)),
])
x = text_clf_svm.fit(file_name, target_file)
Python 错误回溯:
File "/Users/aravind/PycharmProjects/PycharmProjects!/minorproject/src/svmClassifier.py", line 27, in <module>
x = text_clf_svm.fit(file_name, target_file)
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/pipeline.py", line 248, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/pipeline.py", line 213, in _fit
**fit_params_steps[name])
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/externals/joblib/memory.py", line 362, in __call__
return self.func(*args, **kwargs)
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/pipeline.py", line 581, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/feature_extraction/text.py", line 869, in
fit_transform
self.fixed_vocabulary_)
File "/Users/aravind/venv/PycharmProjects!/lib/python3.6/site-
packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop
words
我的 svmtest.txt 内容:
train is so bad it is very dirty
great and awesome train
我的 target.txt 内容:
0
1
我将这个简单的数据用于测试目的。我收到上述错误。我不确定是什么问题。
【问题讨论】:
-
两个答案在某种程度上都有帮助,但我不能同时接受 -_-。非常感谢!
标签: python numpy scikit-learn svm