【问题标题】:Get stuck in Python to use grid search on H2O's XGBoost卡在 Python 中以在 H2O 的 XGBoost 上使用网格搜索
【发布时间】:2018-08-27 02:35:21
【问题描述】:

通过 Python 编码在 xgboost 中使用 Gridsearch 时,我没有遇到这个问题。但是今天当我尝试在H2O的xgboost中使用Gridsearch(也是使用H2O的Gridsearch功能)时,并没有让我通过。下面是代码:

xgboost_hyperparameters ={ 'max_depth' : range(2,10)
               ,'min_rows' : range(1,9)                                      #min_child_weight
               ,'sample_rate' : [i/10 for i in range (5,10)]}                 #subsample  
               ,'col_sample_rate_per_tree' : [i/10 for i in range (5,10)]}    #colsample_bytree


param = {'booster': 'gbtree', 
     'col_sample_rate': 1,                     #colsample_bylevel
     'keep_cross_validation_predictions': True,
     'learn_rate' : 0.1,         
     'max_abs_leafnode_pred': 1.0,        
     'nfolds': 10,
     'ntrees' : 24,
     'reg_alpha': 0.0,
     'reg_lambda': 5.0

    }

xgboost_grid1 = H2OGridSearch(model = H2OXGBoostEstimator(**param),
                         grid_id = 'xgboost_grid1',
                         hyper_params = xgboost_hyperparameters)

在 Jupyter Notebook 中传递,但是当我开始使用下面的代码训练模型时,它报告错误:

xgboost_grid1.train(x=x, y=y,
           training_frame=train,
           validation_frame=valid)

错误信息:

H2OResponseError                          Traceback (most recent call last)
<ipython-input-15-b1393b94399c> in <module>()
      1 xgboost_grid1.train(x=x, y=y,
      2                    training_frame=train,
----> 3                    validation_frame=valid)
      4 

~/anaconda3/lib/python3.6/site-packages/h2o/grid/grid_search.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, **params)
    206         x = list(xset)
    207         parms["x"] = x
--> 208         self.build_model(parms)
    209 
    210 

~/anaconda3/lib/python3.6/site-packages/h2o/grid/grid_search.py in build_model(self, algo_params)
    221         if is_auto_encoder and y is not None: raise ValueError("y should not be specified for autoencoder.")
    222         if not is_unsupervised and y is None: raise ValueError("Missing response")
--> 223         self._model_build(x, y, training_frame, validation_frame, algo_params)
    224 
    225 

~/anaconda3/lib/python3.6/site-packages/h2o/grid/grid_search.py in _model_build(self, x, y, tframe, vframe, kwargs)
    243         rest_ver = kwargs.pop("_rest_version") if "_rest_version" in kwargs else None
    244 
--> 245         grid = H2OJob(h2o.api("POST /99/Grid/%s" % algo, data=kwargs), job_type=(algo + " Grid Build"))
    246 
    247         if self._future:

~/anaconda3/lib/python3.6/site-packages/h2o/h2o.py in api(endpoint, data, json, filename, save_to)
    101     # type checks are performed in H2OConnection class
    102     _check_connection()
--> 103     return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
    104 
    105 

~/anaconda3/lib/python3.6/site-packages/h2o/backend/connection.py in request(self, endpoint, data, json, filename, save_to)
    400                                     auth=self._auth, verify=self._verify_ssl_cert, proxies=self._proxies)
    401             self._log_end_transaction(start_time, resp)
--> 402             return self._process_response(resp, save_to)
    403 
    404         except (requests.exceptions.ConnectionError, requests.exceptions.HTTPError) as e:

~/anaconda3/lib/python3.6/site-packages/h2o/backend/connection.py in _process_response(response, save_to)
    723         # Client errors (400 = "Bad Request", 404 = "Not Found", 412 = "Precondition Failed")
    724         if status_code in {400, 404, 412} and isinstance(data, (H2OErrorV3, H2OModelBuilderErrorV3)):
--> 725             raise H2OResponseError(data)
    726 
    727         # Server errors (notably 500 = "Server Error")

H2OResponseError: Server error water.exceptions.H2OIllegalArgumentException:
  Error: Can't parse the hyper_parameters dictionary; got error: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 28 path $. for raw value: {'max_depth': range(2, 10), 'min_rows': range(1, 9), 'sample_rate': [0.5, 0.6, 0.7, 0.8, 0.9]}
  Request: POST /99/Grid/xgboost
    data: {'hyper_parameters': "{'max_depth': range(2, 10), 'min_rows': range(1, 9), 'sample_rate': [0.5, 0.6, 0.7, 0.8, 0.9]}", 'booster': 'gbtree', 'col_sample_rate': '1', 'keep_cross_validation_predictions': 'True', 'learn_rate': '0.1', 'max_abs_leafnode_pred': '1.0', 'nfolds': '10', 'ntrees': '24', 'reg_alpha': '0.0', 'reg_lambda': '5.0', 'training_frame': 'py_4_sid_80f1', 'validation_frame': 'py_5_sid_80f1', 'response_column': 'label', 'grid_id': 'xgboost_grid1'}

需要帮助,因为我可以在 H2O 的网站和此处找到的文档很少。

【问题讨论】:

  • 哪个版本的 h2o? 0xdata.atlassian.net/browse/PUBDEV-4704 抱怨它不起作用,但这是一条不同的错误消息,并说它已在 3.14.0.1 中修复。
  • @Darren Cook H2O 集群版本:3.18.0.2

标签: python h2o xgboost grid-search


【解决方案1】:

你必须使用:

list(range(...))

代替:

range(...) --&gt; 'max_depth' : list(range(2,10)) etc.

【讨论】:

    猜你喜欢
    • 2018-11-27
    • 2018-12-22
    • 2016-09-20
    • 1970-01-01
    • 2023-03-29
    • 2020-02-24
    • 1970-01-01
    • 2016-06-10
    • 1970-01-01
    相关资源
    最近更新 更多