【问题标题】:Calculating probabilities in XGBoost在 XGBoost 中计算概率
【发布时间】:2020-02-02 21:36:11
【问题描述】:

我开始使用 R 中的 XGBoost,并尝试将 binary:logistic 模型的预测与使用自定义日志损失函数生成的预测相匹配。我希望以下两个对 predict 的调用会产生相同的结果:

require(xgboost)

loglossobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  preds <- 1/(1 + exp(-preds))
  grad <- preds - labels
  hess <- preds * (1 - preds)
  return(list(grad = grad, hess = hess))
}

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train<-agaricus.train
test<-agaricus.test

model<-xgboost(data = train$data, label = train$label, nrounds=2,objective="binary:logistic")
preds = predict(model,test$data)
print (head(preds))

model<-xgboost(data = train$data, label = train$label, nrounds=2,objective=loglossobj, eval_metric = "error")
preds = predict(model,test$data)
x = 1 / (1+exp(-preds))
print (head(x))

自定义对数损失函数的模型输出未应用逻辑转换 1/(1+exp(-x))。但是,如果我这样做,则两次调用 predict 的结果概率不同:

[1] 0.2582498 0.7433221 0.2582498 0.2582498 0.2576509 0.2750908


[1] 0.3076240 0.7995583 0.3076240 0.3076240 0.3079328 0.3231709

我相信有一个简单的解释。有什么建议吗?

【问题讨论】:

    标签: r regression logistic-regression xgboost predict


    【解决方案1】:

    事实证明,这种行为是由于初始条件造成的。 xgboost 在调用 binary:logisticbinary:logit_raw 时隐式假定 base_score=0.5,但在使用时必须将 base_score 设置为 0.0 以复制它们的输出自定义损失函数。这里,base_score 是所有实例的初始预测分数。

    为了说明,以下 R 代码在所有三种情况下生成相同的预测:

    require(xgboost)
    
    loglossobj <- function(preds, dtrain) {
      labels <- getinfo(dtrain, "label")
      preds <- 1/(1 + exp(-preds))
      grad <- preds - labels
      hess <- preds * (1 - preds)
      return(list(grad = grad, hess = hess))
    }
    
    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
    train<-agaricus.train
    test<-agaricus.test
    
    model<-xgboost(data = train$data, label = train$label, objective = "binary:logistic", nround = 10, eta = 0.1, verbose=0)
    preds = predict(model,test$data)
    print (head(preds))
    
    model<-xgboost(data = train$data, label = train$label, objective = "binary:logitraw", nround = 10, eta = 0.1, verbose=0)
    preds = predict(model,test$data)
    x = 1 / (1+exp(-preds))
    print (head(x))
    
    model<-xgboost(data = train$data, label = train$label, objective = loglossobj, base_score = 0.0, nround = 10, eta = 0.1, verbose=0)
    preds = predict(model,test$data)
    x = 1 / (1+exp(-preds))
    print (head(x))
    

    哪个输出

    [1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717
    [1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717
    [1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717
    

    【讨论】:

      猜你喜欢
      • 2021-08-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多