xgboost 预测期间的异常：无法从 DMatrix 初始化 DMatrix答案

【问题标题】：Exception during xgboost prediction: can not initialize DMatrix from DMatrixxgboost 预测期间的异常：无法从 DMatrix 初始化 DMatrix
【发布时间】：2019-04-29 15:01:33
【问题描述】：

我使用 Scikit-Learn Python API 在 Python 中训练了一个 xgboost 模型，并使用 pickle 库对其进行了序列化。我将模型上传到 ML Engine，但是当我尝试进行在线预测时，出现以下异常：

Prediction failed: Exception during xgboost prediction: can not initialize DMatrix from DMatrix

我用于预测的 json 示例如下：

{  
   "instances":[  
      [  
         24.90625,
         21.6435643564356,
         20.3762376237624,
         24.3679245283019,
         30.2075471698113,
         28.0947368421053,
         16.7797359774725,
         14.9262079299572,
         17.9888028979966,
         15.3333284503293,
         19.6535308744024,
         17.1501961307627,
         0.0,
         0.0,
         0.0,
         0.0,
         0.0,
         509.0,
         497.0,
         439.0,
         427.0,
         407.0,
         1.0,
         1.0,
         1.0,
         1.0,
         1.0,
         2.0,
         23.0,
         10.0,
         58.0,
         11.0,
         20.0,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.9423076923077,
         26.3082269243683,
         23.6212606363851,
         22.6752334301282,
         27.4343583104833,
         34.0090408101173,
         11.1991944104063,
         7.33420726455092,
         8.15160392948917,
         11.4119236389594,
         17.9429092915607,
         18.0573102225845,
         32.8902876598084,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.0028328611898017,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0531491870801522
      ]
   ]
}

我使用以下代码来训练我的模型：

def _train_model(X, y):
    clf = xgb.XGBClassifier(max_depth=6,
                            learning_rate=0.01,
                            n_estimators=100,
                            n_jobs=-1)
    clf.fit(X, y)
    return clf

其中X 和y 都是numpy.ndarray：

Type of X: <class 'numpy.ndarray'> Type of y: <class 'numpy.ndarray'>

我还在使用xgboost 0.72.1、Python 3.5 和 ML 运行时1.9。

有谁知道问题的根源是什么？

谢谢！

【问题讨论】：

print(type(X))
@EranMoshe X 和 y 都是 numpy.ndarray（我在问题中添加了这个）
不在培训中。预测时。
对于预测，我使用 ML Engine 并使用 REST API 发送 json，键为“instances”，值为列表列表。示例json在问题中

标签： python-3.x xgboost google-cloud-ml

【解决方案1】：

似乎问题是由于酸洗造成的。我能够重现它并进行修复，但同时您可以尝试像下面那样导出您的分类器吗？

clf._Booster.save_model('./model.bst')

现在应该可以解除对您的阻止。如果没有，请随时联系cloudml-feedback@google.com。

【讨论】：

您能否提供一个示例，说明您应该如何加载这样的模型，@N3da？

【解决方案2】：

当我尝试使用以 .pkl 格式转储的经过训练的 XGBoost 模型对测试数据进行评分时，我也遇到了类似的问题或特征不匹配。然而，在以 .bst 格式保存模型后，我能够毫无问题地对相同的训练数据进行评分。在 XGBoost 方面，.pkl 和 .bst 格式的两种实现似乎有所不同。

【讨论】：

【解决方案3】：

更进一步，回答上面关于加载已保存模型的 kuza 问题：

保存模型：

clf._Booster.save_model('./model.bst')

加载保存的模型：

model = xgboost.Booster({'nthread': 4})  # initialize before loading model
model.load_model('./model.bst')  # load model

这解决了我在模型上使用 pickle 时遇到的 2 个问题。问题 1 是一个奇怪的异常：ValueError: feature_names mismatch:

还要检查您是否在加载的模型上使用 predict_proba，并得到一个奇怪的异常。解决这个问题的方法是使用直接预测函数 Vice predict_proba。

【讨论】：