【发布时间】:2013-12-05 12:46:54
【问题描述】:
我是 scikit 学习的新手。我正在尝试进行 tfidf 矢量化以适应 1*M numpy.array 即 tot_data (在下面的代码中),由英文句子组成。 这里的 'words' 是一个 numpy.array (1*173),包含停用词列表。 我需要明确定义参数 stop_words。 如果我不显式使用参数 stop_words,代码运行良好,但下面的行显示错误。
word = numpy.array(['a','about',...])
>>> vectorizer = TfidfVectorizer(max_df=.95,stop_words=word).fit(tot_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 1203, in fit
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 780, in fit_transform
vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 710, in _count_vocab
analyze = self.build_analyzer()
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 225, in build_analyzer
stop_words = self.get_stop_words()
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 208, in get_stop_words
return _check_stop_list(self.stop_words)
File "/usr/local/python2.7/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 85, in _check_stop_list
if stop == "english":
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
【问题讨论】:
标签: python numpy scikit-learn