Gridsearch中的默认CV和Kfold有什么区别答案

【问题标题】：What is the difference between the default CV in Gridsearch and KfoldGridsearch中的默认CV和Kfold有什么区别
【发布时间】：2018-03-24 15:32:21
【问题描述】：

我想知道，在 sklearn 的 GridSearchCV 方法中实现的默认交叉验证与使用它的 Kfold 方法有什么区别，如下代码所示：

不使用 Kfold：

clf = GridSearchCV(estimator=model, param_grid=parameters, cv=10, scoring='f1_macro')
clf = clf.fit(xOri, yOri)

与 Kfold：

NUM_TRIALS = 5
    for i in range(NUM_TRIALS):
         cv = KFold(n_splits=10, shuffle=True, random_state=i)
         clf = GridSearchCV(estimator=model, param_grid=parameters, cv=cv, scoring='f1_macro')
         clf = clf.fit(xOri, yOri)

正如我从手册中了解到的那样，他们都将数据分成 10 个部分，9 个用于训练，1 个用于验证，但在使用 Kfold 的示例中 .. 它执行了 5 次采样过程 (NUM_TRIALS = 5 ) 并且每次数据在分成 10 个部分之前被打乱。我对吗？

【问题讨论】：

标签： python scikit-learn cross-validation

【解决方案1】：

看来你是对的，是的。

GridSearchCV 使用 KFold 或 StratifiedKFold，具体取决于您的模型是用于回归 (KFold) 还是分类（然后使用 StratifiedKFold）。

由于我不知道您的数据是什么样的，我无法确定在这种情况下使用了什么。

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

但是您上面的代码将使用不同的随机种子重复 KFold 验证 5 次。

这是否会产生有意义的不同数据拆分？不确定。

【讨论】：