【发布时间】:2015-02-02 21:50:54
【问题描述】:
我试图了解为什么会出现以下情况 - 我正在使用虹膜数据并使用 k-最近邻分类器进行交叉验证以选择最佳 k .
from sklearn.neighbors import KNeighborsClassifier
from sklearn import grid_search
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=0.33, random_state=42)
parameters = {'n_neighbors': range(1,21)}
knn = sklearn.neighbors.KNeighborsClassifier()
clf = grid_search.GridSearchCV(knn, parameters,cv=10)
clf.fit(X_train, Y_train)
clf 对象有结果。
print clf.grid_scores_
[平均:0.94000,标准:0.08483,参数:{'n_neighbors':1},平均:0.93000,标准:0.08251,参数:{'n_neighbors':2},平均:0.94000,标准:0.08456,参数: {'n_neighbors':3},均值:0.95000,标准:0.08101,参数:{'n_neighbors':4},均值:0.95000,标准:0.08562,参数:{'n_neighbors':5},均值:0.93000,标准: 0.08284,参数:{'n_neighbors':6},均值:0.95000,标准:0.08512,参数:{'n_neighbors':7},均值:0.94000,标准:0.08414,参数:{'n_neighbors':8},均值: 0.94000,标准:0.08414,参数:{'n_neighbors':9},均值:0.94000,标准:0.08414,参数:{'n_neighbors':10},均值:0.94000,标准:0.08483,参数:{'n_neighbors':11 },均值:0.93000,标准:0.08284,参数:{'n_neighbors':12},均值:0.93000,标准:0.08284,参数:{'n_neighbors':13},均值:0.94000,标准:0.08414,参数:{' n_neighbors':14},均值:0.94000,标准:0.08483,参数:{'n_neighbors':15},均值:0.93000,标准:0.08284,参数:{'n_neighbors':16},均值:0.94000,标准:0.08483,参数:{'n_neighbors':17},平均值:0.93000,标准: 0.09458,参数:{'n_neighbors':18},均值:0.94000,标准:0.08483,参数:{'n_neighbors':19},均值:0.93000,标准:0.10887,参数:{'n_neighbors':20}]
但是当我得到第一个案例k=1的10个CV结果时
print clf.grid_scores_[0].cv_validation_scores
我们得到
array([ 1. , 0.90909091, 1. , 0.72727273, 0.9 ,
1. , 1. , 1. , 1. , 0.88888889])
但是,这 10 个观察值的平均值
print clf.grid_scores_[0].cv_validation_scores.mean()
是 0.942525252525,而不是对象上显示的 0.940000。
所以,我对平均值在做什么以及为什么不一样感到非常困惑。我阅读了文档,但没有找到任何对我有帮助的东西。我错过了什么?
【问题讨论】: