我如何评估 StratifiedKFold 模型答案

【问题标题】：how can i evaluate StratifiedKFold model我如何评估 StratifiedKFold 模型
【发布时间】：2019-10-01 08:12:50
【问题描述】：

    import numpy as np
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.wrappers.scikit_learn import KerasClassifier 
    from sklearn.model_selection import StratifiedKFold 
    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import cross_val_predict   

    x_train = dataset[0:700,:-1]
    y_train = dataset[0:700,-1]
    x_test = dataset[700:,:-1]
    y_test = dataset[700:,-1]

    def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

    model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=64)
    skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed) 

    scores = cross_val_score(model, x_train, y_train, cv=skf)
    predictions = cross_val_predict(model, x_test, y_test, cv=skf)

我想通过 StratifiedKFold 训练 [x_train], [y_train] 并通过 [x_test]、[y_test] 进行评估我能怎么做？我试过cross_val_predict。但我觉得不合适。

【问题讨论】：

您的意思是要分层拆分训练和测试？
是的，完全正确。我想以分层的方式拆分训练和测试。训练（x_train，y_train）和测试（x_test，y_test）

标签： python numpy tensorflow keras

【解决方案1】：

要以分层方式在训练和测试之间进行拆分，您可以使用：

from sklearn.model_selection import train_test_split
dataset_train, dataset_test = train_test_split(dataset,
                                                stratify=dataset[:,-1], 
                                                test_size=0.2)

#split both datasets into X,y

检查：

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Stratified Train/Test-split in scikit-learn

【讨论】：

请检查我的答案。

【解决方案2】：

skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)
accuracy=[]
for train in skf.split(x_train, y_train):
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

这个怎么样？这是工作，但我不知道它是否正确。

【讨论】：

你为什么要循环 3 次，每次都创建一个新模型（覆盖）而不使用循环内的迭代变量 train？