【问题标题】:python split to train/test/val using GroupShuffleSplitpython 使用 GroupShuffleSplit 拆分到训练/测试/验证
【发布时间】:2021-09-10 02:47:00
【问题描述】:

我有一个数据集,我想使用 GroupShuffleSplit 将其拆分为训练 (60%) 测试 (20%) 和验证 (20%)。 是否可以将其用于 3 种类型的折叠而不是仅 2 种(训练/测试)?

如果不是 - 实现它的最佳方式是什么? 代码应该是这样的:

gss = GroupShuffleSplit(n_splits=6, train_size=.6, random_state=42)
for train_index, test_index, valid_index in gss.split(X = x, y = y, groups=groups):
    ....

【问题讨论】:

    标签: python machine-learning scikit-learn deep-learning cross-validation


    【解决方案1】:

    您可以使用两次train_test_split 进行 60-20-20 次拆分。

    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25)
    

    【讨论】:

    • 谢谢,但是可以使用 GroupShuffleSplit 在 for 循环中完成吗?
    • 对不起,我不知道GroupShuffleSplit
    • @Adrash Wase 以及如何将其包装在 for 循环中?
    【解决方案2】:

    这对我有用。做两次:

    gs_1 = GroupShuffleSplit(n_splits=1, train_size=.8, random_state=42)
    train_test_ix, val_ix = next(gs_1.split(X, y, groups=X_new.group))
    X_train_test = X.loc[train_test_ix]
    X_val = X.loc[val_ix]
    y_train_test = y.loc[train_test_ix]
    y_val = y.loc[val_ix]
    

    重置索引(没有索引重置对我不起作用)

    X_train_test = X_train_test.reset_index(drop=True)
    y_train_test = y_train_test.reset_index(drop=True)
    gs_2 = GroupShuffleSplit(n_splits=1, train_size=.75, random_state=42)
    train_ix, test_ix = next(gs_2.split(X_train_test, y_train_test, groups=X_train_test.group))
    X_train = X_train_test.loc[train_ix]
    X_test = X_train_test.loc[test_ix]
    y_train = y.loc[train_ix]
    y_test = y.loc[test_ix]
    

    【讨论】:

      猜你喜欢
      • 2020-03-23
      • 2021-09-20
      • 1970-01-01
      • 2017-12-26
      • 2018-12-21
      • 1970-01-01
      • 2017-04-11
      • 1970-01-01
      • 2017-06-16
      相关资源
      最近更新 更多