【问题标题】:Stratified GroupShuffleSplit in Scikit-learnScikit-learn 中的分层 GroupShuffleSplit
【发布时间】:2017-07-07 05:37:37
【问题描述】:

我想问是否可以在 scikit-learn 中执行“Stratified GroupShuffleSplit”,即GroupShuffleSplitStratifiedShuffleSplit 的组合

这是我正在使用的代码示例:

cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\
    train_size=train_size,random_state=random_state).split(\
    allr_sets_nor[:,:2],allr_labels,groups=allr_groups)
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\
    param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)

这里我申请了GroupShuffleSplit,但我还是想根据allr_labels添加启动

【问题讨论】:

  • StratifiedShuffleSplit 如果需要也有参数组。只需使用 Stratifiedshufflesplit 将 allr_labels 并在适合 GridSearchCV 时将组传递给 fit() 方法
  • 不幸的是,它对我不起作用,我认为这个选项是无效的,因为它在文档中说“总是被忽略,为了兼容性而存在。”

标签: python scikit-learn dataset shuffle cross-validation


【解决方案1】:

我通过在组上应用 StratifiedShuffleSplit 解决了这个问题,然后手动查找训练和测试集索引,因为它们链接到组索引(在我的情况下,每个组包含从 6*index6*index+5 的 6 个连续集)

如下:

sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size,
    train_size=train_size,random_state=random_state).split(all_groups,all_labels) 
        # startified splitting for groups only

i=0
train_is = [np.array([],dtype=int)]*n_splits
test_is = [np.array([],dtype=int)]*n_splits
for train_index,test_index in sss :
        # finding the corresponding indices of reflected training and testing sets
    train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)])))
    test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)])))
    i=i+1

cv=[(train_is[i],test_is[i]) for i in range(n_splits)]
        # constructing the final cross-validation iterable: list of 'n_splits' tuples;
        # each tuple contains two numpy arrays for training and testing indices respectively

opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid,
                 scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)

【讨论】:

    猜你喜欢
    • 2020-02-17
    • 2018-01-24
    • 2015-06-08
    • 2015-12-11
    • 2017-04-11
    • 1970-01-01
    • 2013-05-03
    • 2017-01-11
    • 2021-07-26
    相关资源
    最近更新 更多