【发布时间】:2017-02-05 03:06:03
【问题描述】:
我正在尝试使用 scikit-learn 库。我导入了 iris 数据集,并尝试训练 knn 算法来预测一些结果。代码如下:
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
iris = datasets.load_iris()
knn = KNeighborsClassifier(n_neighbors=1)
X = iris.data
y = iris.target
print X.shape
print y.shape
#training the model
knn.fit(X, y)
knn.predict([3, 4, 5, 2])
但我收到以下错误:
(150L, 4L)
(150L,)
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
我在谷歌上搜索并找到了一些解决方法。我尝试使用X = X.reshape(-1, 1) 和X = X.reshape(1, -1),但随后出现以下错误:
Traceback (most recent call last):
File "E:/Analytics Practice/Social Media Analytics/Python Services/DataAnalysis/sk-learn-dir/test.py", line 13, in <module>
knn.fit(X, y)
File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\neighbors\base.py", line 778, in fit
X, y = check_X_y(X, y, "csr", multi_output=True)
File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\utils\validation.py", line 520, in check_X_y
check_consistent_length(X, y)
File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [150 600]
knn 算法需要在 scikit-learn 中训练的正确维度格式是什么?
【问题讨论】:
-
升级你的sklearn怎么样?或者您已经在使用最新版本?
-
我使用的版本是
0.18rc2 -
它很可能不喜欢你传入
predict的一维数组 -
不知道这个KNN算法喜欢什么。因为如果我尝试
y = y.reshape(-1, 1),它会说DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). -
X和y没问题,但是您传递到predictfn -knn.predict([3, 4, 5, 2])的数据结构不是二维数组。有docs了解KNN算法喜欢什么。
标签: python scikit-learn knn