【发布时间】:2018-08-25 23:07:27
【问题描述】:
我建立了以下分类模型:
def buildData(x):
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(x)
tf_transformer = TfidfTransformer().fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
return X_train_tf
x, y = parseXml('data/training.xml')
xDev, yDev = parseXml('data/dev.xml')
x = buildData(x)
clf = MultinomialNB().fit(x, y)
predicted = clf.predict(x)
print( 'Accuracy: ', accuracy_score(y, predicted))
我使用训练数据“x”拟合模型,并在“x”上对其进行测试..
问题是,如果我想在 xDev (predicted = clf.predict(xDev)) 上预测它会显示错误。
我认为这是因为数据没有准备好(在 Tf_idf 矩阵形状中),所以我将 xDev 数据传递给了同一个函数:
xDev = buildData(xDev)
准备好了,可惜出现了这个错误:
Traceback (most recent call last): File "C:/Users/BG/Desktop/P2/E2.py", line 43, in <module>
predicted = clf.predict(xDev) File "C:\Python35\lib\site-packages\sklearn\naive_bayes.py", line 66, in predict
jll = self._joint_log_likelihood(X) File "C:\Python35\lib\site-packages\sklearn\naive_bayes.py", line 725, in
_joint_log_likelihood
return (safe_sparse_dot(X, self.feature_log_prob_.T) + File "C:\Python35\lib\site-packages\sklearn\utils\extmath.py", line 135, in safe_sparse_dot
ret = a * b File "C:\Python35\lib\site-packages\scipy\sparse\base.py", line 476, in
__mul__
raise ValueError('dimension mismatch') ValueError: dimension mismatch
【问题讨论】:
标签: python scikit-learn