如何使用 KFold 而不是 StratifiedKFold 在 scikit-learn 中进行 RFECV？答案

【问题标题】：How to do RFECV in scikit-learn with KFold, not StratifiedKFold?如何使用 KFold 而不是 StratifiedKFold 在 scikit-learn 中进行 RFECV？
【发布时间】：2015-07-15 17:05:30
【问题描述】：

from sklearn.cross_validation import StratifiedKFold, KFold
from sklearn.feature_selection import RFECV

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=StratifiedKFold(y, 10),
scoring='accuracy') 
rfecv.fit(X, y)

是使用 StratifiedKFold 进行 RFECV 的示例。问题是如何用正常的KFold做RFECV？

cv=KFold(y, 10) 不是答案，因为 KFold 和 StratifiedKFold 接受并返回整个 different 值。

【问题讨论】：

您正在链接到非常旧的文档，新文档在这里：scikit-learn.org/stable/tutorial/statistical_inference/…您真的阅读了您要链接的内容吗？
不小心粘贴了错误的链接，因为（幸运的是）scikit 有一个稳定的 API。我经常阅读 PDF 格式的 scikit 0.16 (current, yet) docs，而且还是一样。谢谢。

标签： python machine-learning scikit-learn classification

【解决方案1】：

KFold(len(y), n_folds = n_folds) 就是答案。所以，对于 10 倍，它就像

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=KFold(len(y),n_folds=10),
scoring='accuracy')

【讨论】：

【解决方案2】：

您可以手动创建自己的简历策略，模仿 KFold 所做的任何事情：

def createCV():
    '''returns somthing like:

    custom_cv = [([0, 1, 2 ,3, 4, 5, 6], [7]), 
          ([0, 1, 2, 3, 4, 5], [6]), 
          ([0, 1, 2, 3, 4], [5]),
          ([0, 1, 2, 3], [4]),
          ([0, 1, 2], [3])] 
    where the 0th list element in each tuple is the training set, and the second is the test 
    '''

manual_cv  = createCV()
rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=manual_cv,
scoring='accuracy')

您甚至可以使用和重新排列 KFold 在 createCV 中为您提供的内容，以满足您的简历需求。

【讨论】：

我很抱歉，但我想通了。感谢您的回答！