【问题标题】:cross_val_score default scoring not consistent?cross_val_score 默认评分不一致?
【发布时间】:2020-11-18 20:25:01
【问题描述】:

根据docs

对于cross_val_scorescoring 参数: 如果没有,则使用估算器的默认记分器(如果可用)。

对于DecisionTreeRegressor,默认条件是mse。那么为什么我在这里得到不同的结果呢?

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)


dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.26)

- cross_val_score(dt, X_train, y_train, cv=10, scoring='neg_mean_squared_error')

>>> array([ 46.94808341,  18.78121305,  18.19914701,  18.06935431,
        17.19546733,  28.91247609,  39.41410887,  21.30453162,
        31.96443414,  23.74191199])


cross_val_score(dt, X_train, y_train, cv=10)

>>> array([ 0.35723619,  0.75254466,  0.7181376 ,  0.65718608,  0.72531937,
        0.4752839 ,  0.43169728,  0.63916363,  0.41406146,  0.68977882])

如果我不得不猜测,似乎默认的scoringR2 而不是mse。我对默认记分器的理解是正确的还是这是一个错误?

【问题讨论】:

标签: python scikit-learn cross-validation decision-tree


【解决方案1】:

DecisionTreeRegression 的默认记分器是 r2-score,您可以在 DecisionTreeRegression 的 docs 中找到它。

 score(self, X, y, sample_weight=None)[source]

    Return the coefficient of determination R^2 of the prediction.

    The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

【讨论】:

    【解决方案2】:

    @PV8 肯定是对的,但我想指出两个细节。

    细节#1:如何使用r2-score 作为评分指标?答案:make_scorer

    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import cross_val_score
    from sklearn.metrics import r2_score, make_scorer
    from sklearn.tree import DecisionTreeRegressor
    
    X, y = load_diabetes(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.26)
    print(cross_val_score(dt, X_train, y_train, cv=10, scoring=make_scorer(r2_score)))
    
    

    如果你多次运行这个程序,你仍然会得到不同的结果。

    细节#2:如何获得一致的结果?

    您需要设置random_state 变量以获得恒定的结果。

    例如:

    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import cross_val_score
    from sklearn.metrics import r2_score, make_scorer
    from sklearn.tree import DecisionTreeRegressor
    
    X, y = load_diabetes(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.26)
    print(cross_val_score(dt, X_train, y_train, cv=10, scoring=make_scorer(r2_score)))
    
    

    结果总是一样的。

    【讨论】:

      猜你喜欢
      • 2013-07-05
      • 1970-01-01
      • 2016-06-22
      • 1970-01-01
      • 2019-08-07
      • 1970-01-01
      • 2016-02-12
      • 2010-10-03
      • 2015-06-27
      相关资源
      最近更新 更多