在 scikit-learn 中使用 knn 算法时的弃用警告答案

【问题标题】：DeprecationWarning while using knn algorithm in scikit-learn在 scikit-learn 中使用 knn 算法时的弃用警告
【发布时间】：2017-02-05 03:06:03
【问题描述】：

我正在尝试使用 scikit-learn 库。我导入了 iris 数据集，并尝试训练 knn 算法来预测一些结果。代码如下：

from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()

knn = KNeighborsClassifier(n_neighbors=1)

X = iris.data
y = iris.target

print X.shape
print y.shape

#training the model
knn.fit(X, y)

knn.predict([3, 4, 5, 2])

但我收到以下错误：

(150L, 4L)
(150L,)
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

我在谷歌上搜索并找到了一些解决方法。我尝试使用X = X.reshape(-1, 1) 和X = X.reshape(1, -1)，但随后出现以下错误：

Traceback (most recent call last):
  File "E:/Analytics Practice/Social Media Analytics/Python Services/DataAnalysis/sk-learn-dir/test.py", line 13, in <module>
    knn.fit(X, y)
  File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\neighbors\base.py", line 778, in fit
    X, y = check_X_y(X, y, "csr", multi_output=True)
  File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\utils\validation.py", line 520, in check_X_y
    check_consistent_length(X, y)
  File "C:\python-venv-test-2.7.10\lib\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [150 600]

knn 算法需要在 scikit-learn 中训练的正确维度格式是什么？

【问题讨论】：

升级你的sklearn怎么样？或者您已经在使用最新版本？
我使用的版本是0.18rc2
它很可能不喜欢你传入predict的一维数组
不知道这个KNN算法喜欢什么。因为如果我尝试y = y.reshape(-1, 1)，它会说DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
X 和 y 没问题，但是您传递到 predict fn - knn.predict([3, 4, 5, 2]) 的数据结构不是二维数组。有docs了解KNN算法喜欢什么。

标签： python scikit-learn knn

【解决方案1】：

感谢@tttthomasssss 的帮助。这是我做错了什么：

当我写[3, 4, 5, 2] 时，python 将其解释为维度为 4X1 的数组，但当我写 [[3, 4, 5, 2]] 时，python 将其解释为 1X4 数组。由于它是一个数据点，对于不同的特征有 4 个不同的值，我将不得不使用[[3, 4, 5, 2]] 输入预测模型。这是帮助我找出两个数组尺寸的代码：

predict_array = [3, 4, 5, 2]
predict_array = np.asarray(predict_array)
print predict_array.shape

predict_array = [[3, 4, 5, 2]]
predict_array = np.asarray(predict_array)
print predict_array.shape

这是输出：

(4L,)
(1L, 4L)

【讨论】：