【问题标题】:How to predict accuracy of xgboost binary choice model?如何预测 xgboost 二元选择模型的准确性?
【发布时间】:2020-07-08 09:48:55
【问题描述】:

我正在为二元选择预测构建 XGBoost 模型。但是,我无法生成预测。我如何从这段代码的末尾到测试数据的实际预测? 我的代码有 7 个自变量和一个因变量,这是一个二元选择。

choice <- dataset_training$choiceprobX
set.seed(1234)
ind <- sample(2, nrow(dataset_training), replace=TRUE, prob=c(0.67, 0.33))
training <- as.matrix(dataset_training[ind==1, 1:7])
head(training)
testing <- as.matrix(dataset_training[ind==2, 1:7])
head(testing)
dataset_trainLabel <- dataset_training[ind==1, 8]
head(dataset_trainLabel)
dataset_testLabel <- dataset_training[ind==2, 8]
head(dataset_testLabel)
xgb.train <- xgb.DMatrix(data=training,label=dataset_trainLabel)
xgb.test <- xgb.DMatrix(data=testing,label=dataset_testLabel)
params = list(
  booster="gbtree",
  eta=0.01,
  max_depth=5,
  gamma=3,
  subsample=0.75,
  colsample_bytree=1,
  objective="binary:logistic",
  eval_metric="logloss"
)
xgb.fit=xgb.train(
  params=params,
  data=xgb.train,
  nrounds=10,
  nthreads=1,
  early_stopping_rounds=10,
  watchlist=list(val1=xgb.train,val2=xgb.test),
  verbose=0
)
xgb.fit

我的目标是生成一个混淆矩阵,但是当我这样做时,它告诉我数据和参考必须是同一级别的因素。

【问题讨论】:

    标签: r machine-learning xgboost


    【解决方案1】:

    让我们使用一个示例数据集 iris,因为我没有你的数据:

    set.seed(100)
    data = iris
    data$Species = as.numeric(data$Species=="versicolor")
    idx = sample(nrow(data),100)
    
    dtrain <- xgb.DMatrix(as.matrix(data[idx,-5]), label = data$Species[idx])
    dtest <- xgb.DMatrix(as.matrix(data[-idx,-5]), label = data$Species[-idx])
    
    param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
                 objective = "binary:logistic", eval_metric = "logloss")
    xgb.fit  <- xgb.train(param, dtrain, nrounds = 10, watchlist)
    

    要做混淆矩阵,我们可以将预测转换为0和1(基于概率> 0.5),然后将表格传递给confusionMatrix函数:

    library(caret)
    pred = as.numeric(predict(xgb.fit,dtest) >0.5)
    obs = getinfo(dtest, "label")
    
    confusionMatrix(table(pred,obs))
    Confusion Matrix and Statistics
    
        obs
    pred  0  1
       0 34  0
       1  1 15
    
                   Accuracy : 0.98            
                     95% CI : (0.8935, 0.9995)
        No Information Rate : 0.7             
        P-Value [Acc > NIR] : 4.034e-07       
    
                      Kappa : 0.9533          
    
     Mcnemar's Test P-Value : 1               
    
                Sensitivity : 0.9714          
                Specificity : 1.0000          
             Pos Pred Value : 1.0000          
             Neg Pred Value : 0.9375          
                 Prevalence : 0.7000          
             Detection Rate : 0.6800          
       Detection Prevalence : 0.6800          
          Balanced Accuracy : 0.9857          
    
           'Positive' Class : 0            
    

    【讨论】:

      猜你喜欢
      • 2021-10-18
      • 2020-09-26
      • 2016-08-08
      • 2019-10-21
      • 1970-01-01
      • 1970-01-01
      • 2020-11-18
      • 1970-01-01
      • 2018-08-30
      相关资源
      最近更新 更多