【发布时间】:2017-09-18 10:23:45
【问题描述】:
我正在尝试 XGBoost 来解决回归问题。在超参数调优的过程中,无论num_boost_round 参数设置为什么,XGBoost 的提前停止 cv 都不会停止我的代码/数据。。此外,它产生 RMSE 分数比 GridSearchCV 差。 我在这里做错了什么?
而且,如果我没有做错任何事,相比 GridSearchCV,提前停止 cv 有什么优势?
GridSearchCV:
import math
def RMSE(y_true, y_pred):
rmse = math.sqrt(mean_squared_error(y_true, y_pred))
print 'RMSE: %2.3f' % rmse
return rmse
scorer = make_scorer(RMSE, greater_is_better=False)
cv_params = {'max_depth': [2,8], 'min_child_weight': [1,5]}
ind_params = {'learning_rate': 0.01, 'n_estimators': 1000,
'seed':0, 'subsample': 0.8, 'colsample_bytree': 0.8,
'reg_alpha':0, 'reg_lambda':1} #regularization => L1 : alpha, L2 : lambda
optimized_GBM = GridSearchCV(xgb.XGBRegressor(**ind_params),
cv_params,
scoring = scorer,
cv = 5, verbose=1,
n_jobs = 1)
optimized_GBM.fit(train_X, train_Y)
optimized_GBM.grid_scores_
输出:
[mean: -62.42736, std: 5.18004, params: {'max_depth': 2, 'min_child_weight': 1},
mean: -62.42736, std: 5.18004, params: {'max_depth': 2, 'min_child_weight': 5},
mean: -57.11358, std: 3.62918, params: {'max_depth': 8, 'min_child_weight': 1},
mean: -57.12148, std: 3.64145, params: {'max_depth': 8, 'min_child_weight': 5}]
XGBoost 简历:
our_params = {'eta': 0.01, 'max_depth':8, 'min_child_weight':1,
'seed':0, 'subsample': 0.8, 'colsample_bytree': 0.8,
'objective': 'reg:linear', 'booster':'gblinear',
'eval_metric':'rmse',
'silent':False}
num_rounds=1000
cv_xgb = xgb.cv(params = our_params,
dtrain = train_mat,
num_boost_round = num_rounds,
nfold = 5,
metrics = ['rmse'], # Make sure you enter metrics inside a list or you may encounter issues!
early_stopping_rounds = 100, # Look for early stopping that minimizes error
verbose_eval = True)
print cv_xgb.shape
print cv_xgb.tail(5)
输出:
(1000, 4)
test-rmse-mean test-rmse-std train-rmse-mean train-rmse-std
995 89.937926 0.263546 89.932823 0.062540
996 89.937773 0.263537 89.932671 0.062537
997 89.937622 0.263526 89.932517 0.062535
998 89.937470 0.263516 89.932364 0.062532
999 89.937317 0.263510 89.932210 0.062525
【问题讨论】:
标签: python random-forest xgboost