评分前的后处理交叉验证预测答案

【问题标题】：post-process cross-validated prediction before scoring评分前的后处理交叉验证预测
【发布时间】：2019-03-11 19:08:26
【问题描述】：

我有一个回归问题，我正在交叉验证结果并评估性能。我事先知道基本事实不能小于零。因此，我想在将预测输入分数指标之前截取预测，以将预测剪裁为零。我认为使用 make_scorer 函数会很有用。是否有可能在交叉验证之后以某种方式对预测进行后处理，但在对其应用评估指标之前？

from sklearn.metrics import mean_squared_error, r2_score, make_scorer
from sklearn.model_selection import cross_validate

# X = Stacked feature vectors
# y = ground truth vector
# regr = some regression estimator

#### How to indicate that the predictions need post-processing 
#### before applying the score function???
scoring = {'r2': make_scorer(r2_score),
           'neg_mse': make_scorer(mean_squared_error)}

scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

PS：我知道有约束估计器，但我想看看像这样的启发式方法会如何执行。

【问题讨论】：

标签： python scikit-learn cross-validation

【解决方案1】：

您可以做的一件事是按照您的建议使用 make_scorer() 将您希望使用的记分器（r2_score、mean_squared_error）包装在自定义记分器函数中。

查看this part of the sklearn documentation 和this Stack Overflow post 中的一些示例。特别是，您的函数可以执行以下操作：

def clipped_r2(y_true, y_pred):
    y_pred_clipped = np.clip(y_pred, 0, None)
    return r2_score(y_true, y_pred_clipped)

def clipped_mse(y_true, y_pred):
    y_pred_clipped = (y_pred, 0, None)
    return mean_squared_error(y_true, y_pred_clipped)

这允许您在调用评分函数（在本例中为 r2_score 或 mean_squared_error）之前直接在记分器中进行后处理。然后要使用它，只需像上面一样使用 make_scorer ，根据 scorer 是评分函数（如 r2，越大越好）或损失函数（mean_squared_error 为 0 时更好，即更少）设置 greater_is_better：

scoring = {'r2': make_scorer(clipped_r2, greater_is_better=True),
           'neg_mse': make_scorer(clipped_mse, greater_is_better=False)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

【讨论】：

谢谢！我确实看过那些资源。我只是对那些（剪辑）评估指标如何知道 y_pred 是什么感到困惑？这应该在每次折叠时从回归器内部传递。 y_pred 是从某个地方的 cross_validate 中定义的吗？不过我会试试的。
这发生在 cross_validate 函数内部。如果您有兴趣，可以查看source code，但它有点难以阅读，因为您必须在一堆函数之间来回切换，以跟踪数据实际用于拟合和评分模型的位置。但是，是的，关键是每个 cross_validation 折叠都有 y_pred 和 y_test。
我不久前查看了源代码，实际上，我无法通过弹性代码找到自己有限的技能：P。感谢您的确认，这完美而干净！几乎要使用 Kfold 来做这件事。