为什么使用 XGBoost 的 rmse 和 mse 如此之大？答案

【问题标题】：why the rmse and mse is so large using XGBoost?为什么使用 XGBoost 的 rmse 和 mse 如此之大？
【发布时间】：2021-12-11 10:51:29
【问题描述】：

我正在学习 XGBoost，mae 和 rmse 数这么大，怎么可能？

这是我在 python 中使用的代码

# Create the DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary: params
params = {"objective":"reg:linear", "max_depth":4}

# Perform cross-validation: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=4, num_boost_round=5, metrics="rmse", as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

# Extract and print final boosting round metric
print((cv_results["test-rmse-mean"]).tail(1))


    train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std
0    141767.535156      429.452682   142980.429688    1193.794436
1    102832.542969      322.473304   104891.392578    1223.157623
2     75872.617187      266.469946    79478.935547    1601.344218
3     57245.651367      273.625016    62411.921875    2220.149857
4     44401.297851      316.422372    51348.281250    2963.378741
    51348.28125

【问题讨论】：

从公式看，RMSE放大了误差，更容易受到异常值的影响
btw，如果要观察整体误差，请查看MSE；如果要观察整体误差及其平稳性，请检查 RMSE
这个案例不是关于异常值，我想你不明白如何解释指标

标签： python machine-learning statistics regression xgboost

【解决方案1】：

我认为您的问题是解释指标。首先，我将解释它的用途：

MSE 代表均方误差，
RMSE 代表均方根误差

这意味着这两个指标都取决于预测值的大小。如果您预测汽车的座位数在 2 到 7 之间变化，那么您的 RMSE 确实很大。另一方面，如果您预测的值在 1 到 1 亿之间变化，则 RMSE 非常低。这就是为什么您应该使用一些其他指标，例如 MAPE（平均绝对百分比误差），这将使您的值 介于 0 和 1 之间。

查看this 链接，了解有关 MAPE 以及如何使用 scikit-learn 使用它的更多信息。

【讨论】：