【问题标题】:Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score?是否可以在 cross_val_predict 中使用与 cross_val_score 中相同的 k 折叠?
【发布时间】:2021-01-01 01:18:23
【问题描述】:

您好,如果我们执行以下操作来计算交叉验证的准确性:

cv_acc = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')    

是否可以使用cross_val_score 中 k 折叠的相同输入来估计 y 预测并创建混淆矩阵(如下所示)?

y_pred = cross_val_predict(model, X_train, y_train, cv=5)
conf_mat = confusion_matrix(y_test, y_pred)

有没有办法存储 k 折叠在 cross_val_score 中的精确拆分方式以确保可比较的混淆矩阵?

干杯:)

【问题讨论】:

    标签: python pandas dataframe scikit-learn classification


    【解决方案1】:

    以下应该有效:

    from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
    k_folds = KFold(n_splits=5)
    splits = list(k_folds.split(X_train, y_train)) # note list here as k_folds.split is a one-off generator
    cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
    y_pred = cross_val_predict(model, X_train, y_train, cv=splits)
    

    有关更多信息,请参阅 cross_val_scorecross_val_predict 的文档。

    一个完整的工作示例:

    from sklearn.model_selection import KFold, cross_val_score, cross_val_predict
    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    
    k_folds = KFold(n_splits=5)
    X_train, y_train = make_classification(1000)
    splits = list(k_folds.split(X_train, y_train))
    model = LogisticRegression()
    cv_acc = cross_val_score(model, X_train, y_train, cv=splits, scoring='accuracy')
    y_pred = cross_val_predict(model, X_train, y_train, cv=splits)
    

    【讨论】:

    • 谢谢老兄!!很大的帮助:)
    猜你喜欢
    • 2020-09-23
    • 2021-05-08
    • 2015-02-08
    • 2017-12-17
    • 2017-09-22
    • 2018-03-17
    • 2016-05-21
    • 1970-01-01
    • 2019-01-18
    相关资源
    最近更新 更多