【问题标题】:Getting Error in running gbm from caret: Error in { : task 1 failed - "inputs must be factors"从插入符号运行 gbm 时出错:{ 中的错误:任务 1 失败 - “输入必须是因素”
【发布时间】:2025-12-08 10:10:01
【问题描述】:

我是 R 新手,正在尝试在 r 中学习和执行 ml。

我在从caret 运行gbm 时收到此错误:Error in { : task 1 failed - "inputs must be factors"

使用相同的 parameters,它可以完美运行许多其他算法,例如 - rfadaboost 等。

参考代码:

fitCtrl_2 <- trainControl(
  method = "cv",
  # repeats = 5,
  number = 10,
  savePredictions = "final",
  classProbs = TRUE,
  summaryFunction = twoClassSummary
) 

以下代码报错

set.seed(123)

system.time(

model_gbm <- train(pull(y) ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, 
                  data = train, 
                  method = "gbm",   # Added for gbm
                  distribution="gaussian",   # Added for gbm
                  metric = "ROC",
                  bag.fraction=0.75,   # Added for gbm
                  # tuneLenth = 10,
                  trControl = fitCtrl_2)
)

以下代码在相同数据上完美运行

支持向量机模型

set.seed(123)

system.time(

model_svm <- train(pull(y) ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired, 
                        data = train, 
                        method = "svmRadial", 
                        tuneLenth = 10,
                        trControl = fitCtrl_2)
)

我浏览了有关此问题的其他 SO 帖子,但不清楚我究竟需要做什么来解决它。

【问题讨论】:

    标签: r machine-learning r-caret


    【解决方案1】:

    好像你在做分类,如果是这样,分布应该是“bernoulli”而不是“gaussian”,下面是一个例子:

    set.seed(111)
    
    df = data.frame(matrix(rnorm(1600),ncol=16))
    
    colnames(df) = c("duration", "nr.employed", "euribor3m", "pdays", "emp.var.rate", 
    "poutcome.success", "month.mar", "cons.conf.idx", "contact.telephone", 
    "contact.cellular", "previous", "age", "cons.price.idx", "month.jun", 
    "job.retired")
    
    df$y = ifelse(runif(100)>0.5,"a","b")
    
    mod = as.formula("y ~  duration+nr.employed+euribor3m+pdays+emp.var.rate+poutcome.success+month.mar+cons.conf.idx+contact.telephone+contact.cellular+previous+age+cons.price.idx+month.jun+job.retired")
    
    model_gbm <- train(mod, data = df, 
                      method = "gbm",   
                      distribution="gaussian",   
                      metric = "ROC",
                      bag.fraction=0.75, 
                      trControl = fitCtrl_2)
    

    你得到一个错误:

    Error in { : task 1 failed - "inputs must be factors"
    

    设置为bernoulli就可以了:

    model_gbm <- train(mod, data = df, 
                          method = "gbm",   
                          distribution="bernoulli",   
                          metric = "ROC",
                          bag.fraction=0.75, 
                          trControl = fitCtrl_2)
    
    model_gbm
    
    Stochastic Gradient Boosting 
    
    100 samples
     15 predictor
      2 classes: 'a', 'b' 
    
    No pre-processing
    Resampling: Cross-Validated (10 fold) 
    Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
    Resampling results across tuning parameters:
    
      interaction.depth  n.trees  ROC        Sens       Spec 
      1                   50      0.6338333  0.7233333  0.500
      1                  100      0.6093333  0.6533333  0.510
      1                  150      0.6193333  0.6500000  0.555
      2                   50      0.6445000  0.6900000  0.545
      2                  100      0.6138333  0.6166667  0.620
      2                  150      0.6085000  0.6700000  0.555
      3                   50      0.5770000  0.6466667  0.555
      3                  100      0.5756667  0.6066667  0.530
      3                  150      0.5808333  0.6300000  0.530
    

    【讨论】:

    • 非常感谢@StupidWolf,它正在与“bernoulli”合作。
    最近更新 更多