模型拟合和交叉验证答案

【问题标题】：Model Fitting and Cross-Validation模型拟合和交叉验证
【发布时间】：2020-03-26 09:38:50
【问题描述】：

我正在训练一个数据集，然后在其他数据集上对其进行测试。

为了提高性能，我想通过 5 倍交叉验证来微调我的参数。

但是，我认为我没有编写正确的代码，因为当我尝试将模型适合我的测试集时，它说它还没有适合它。我虽然交叉验证部分适合模型？或者也许我必须提取它？

这是我的代码：

svm = SVC(kernel='rbf', probability=True, random_state=42)

accuracies = cross_val_score(svm, data_train, lbs_train, cv=5)

pred_test = svm.predict(data_test)
accuracy = accuracy_score(lbs_test, pred_test)

【问题讨论】：

您在交叉验证self.svm？不应该是svm吗？
我忘了删除self.svm。我在一个班级里有这个，不想复制整个班级
可能你也需要，我们需要看看在交叉验证之前是否调用了self.svm

标签： python scikit-learn svm cross-validation

【解决方案1】：

没错，cross_validate_score 不会返回拟合模型。在您的示例中，您有 cv=5 这意味着该模型适合 5 次。那么，你想要哪一个？最后一个？

函数cross_val_score 是sklearn.model_selection.cross_validate 的更简单版本。这不仅会返回分数，还会返回更多信息。

所以你可以这样做：

from sklearn.model_selection import cross_validate

svm = SVC(kernel='rbf', probability=True, random_state=42)

cv_results = cross_validate(svm, data_train, lbs_train, cv=5, return_estimator=True)
# cv_results is a dict with the following keys:
# 'test_score' which is what cross_val_score returns
# 'train_score'
# 'fit_time'
# 'score_time'
# 'estimator' which is a tuple of size cv and only if return_estimator=True

accuracies = cv_results['test_score'] # what you had before

svms = cv_results['estimator']
print(len(svms)) # 5

svm = svms[-1] # the last fitted svm, or pick any that you want

pred_test = svm.predict(data_test)
accuracy = accuracy_score(lbs_test, pred_test)

注意，您需要在此处选择适合 5 的 SVM 之一。理想情况下，您将使用交叉验证来测试模型的性能。所以，你不需要在最后再做一次。然后，您将再次拟合您的模型，但这一次使用所有数据，这些数据将成为您将在生产中实际使用的模型。

另一个注意事项，您提到您希望这可以微调模型的参数。也许你应该看看超参数优化。例如：https://datascience.stackexchange.com/a/36087/54395 在这里您将看到如何使用交叉验证并定义参数搜索空间。

【讨论】：