【问题标题】:'OverflowError: Python int too large to convert to C long' error when i'm using RandomizedSearchCV on XGBOOST当我在 XGBOOST 上使用 RandomizedSearchCV 时出现“OverflowError: Python int too large to convert to C long”错误
【发布时间】:2022-01-26 08:08:01
【问题描述】:

这是我的代码:

def create_XGBoost_classifier(X_train, y_train):

import xgboost as xgb


kf = KFold(n_splits=10, shuffle=True, random_state=42)
xgb = xgb.XGBClassifier(random_state=42)

learning_rate = get_learning_rate()

xgb_hyperparameters = {
    # 'n_estimators': np.arange(2000, 5001, 1000).tolist(),
    'n_estimators': [100],
    'max_depth':  np.arange(3, 16, 1).tolist() ,
    'learning_rate': learning_rate ,
    'min_child_weight':  np.arange(0, 8.01, 0.1).tolist(),
    'subsample':  np.arange(0.25, 1.01, 0.025).tolist(),
    'colsample_bytree': np.arange(0.3, 1.01, 0.025).tolist(),
    'colsample_bylevel': np.arange(0.3, 1.01, 0.025).tolist(),
    'min_child_weight':  np.arange(0.25, 10.01, 0.025).tolist(),
    #'gamma':  np.arange(0, 5.01, 0.25).tolist(),
    #'reg_lambda': [0.001, 0.005, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 5.0, 10.0, 50.0, 100.0],
    #'scale_pos_weight' :  [1, 10, 25, 50, 75, 99, 100, 1000] ,
    'verbosity': [0]


}

xgb_random = RandomizedSearchCV(estimator=xgb,
                               param_distributions=xgb_hyperparameters,
                               n_iter=  150, cv=  kf, verbose=1,
                               random_state=42, n_jobs=-1, #scoring = 'precision'
                               )
xgb_random.fit(X_train, y_train)

当下一行运行时:

xgb_random.fit(X_train, y_train)

我收到此错误:

  File "sklearn\utils\_random.pyx", line 223, in sklearn.utils._random.sample_without_replacement
OverflowError: Python int too large to convert to C long

奇怪的是,如果我删除该行:

min_child_weight':  np.arange(0.25, 10.01, 0.025).tolist(),

代码是:

  xgb_hyperparameters = {
    # 'n_estimators': np.arange(2000, 5001, 1000).tolist(),
    'n_estimators': [100],
    'max_depth':  np.arange(3, 16, 1).tolist() ,
    'learning_rate': learning_rate ,
    'min_child_weight':  np.arange(0, 8.01, 0.1).tolist(),
    'subsample':  np.arange(0.25, 1.01, 0.025).tolist(),
    'colsample_bytree': np.arange(0.3, 1.01, 0.025).tolist(),
    'colsample_bylevel': np.arange(0.3, 1.01, 0.025).tolist(),
    #'gamma':  np.arange(0, 5.01, 0.25).tolist(),
    #'reg_lambda': [0.001, 0.005, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 5.0, 10.0, 50.0, 100.0],
    #'scale_pos_weight' :  [1, 10, 25, 50, 75, 99, 100, 1000] ,
    'verbosity': [0]


}

它有效。我认为如果我删除其他行而不是 'min_child_weight' 行,它也会起作用

【问题讨论】:

    标签: python scikit-learn classification data-science xgboost


    【解决方案1】:

    我已经解决了。 grid_size 大于 int 的最大值,这部分代码(它在使用 fit 函数时运行)返回错误:

        def __iter__(self):
        rng = check_random_state(self.random_state)
    
        # if all distributions are given as lists, we want to sample without
        # replacement
        if self._is_all_lists():
            # look up sampled parameter settings in parameter grid
            param_grid = ParameterGrid(self.param_distributions)
            grid_size = len(param_grid)
            n_iter = self.n_iter
    

    我缩小了超参数值的范围,它可以工作

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-10-08
      • 2010-12-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-12-16
      • 1970-01-01
      相关资源
      最近更新 更多