【问题标题】:R caret: Combine rfe() and train()R 插入符号:结合 rfe() 和 train()
【发布时间】:2019-06-02 22:41:18
【问题描述】:

我想使用rf(随机森林)方法将递归特征消除与rfe() 和与trainControl() 的模型选择结合起来。我想要 MAPE(平均绝对百分比误差)而不是标准汇总统计。因此,我使用ChickWeight 数据集尝试了以下代码:

library(caret)
library(randomForest)
library(MLmetrics)

# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
  mape <- MAPE(y_pred = data$pred, y_true = data$obs)
  c(MAPE = mape)
}

# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
                    summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))

# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)

set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet, 
           sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
           data = ChickWeight, 
           method="rf",
           ntree = 250, 
           metric= "RMSE", 
           tuneGrid=tunegrid,
           rfeControl=rfec,
           trControl = trc)

代码运行没有错误。但是我在哪里可以找到我在trainControl 中定义为summaryFunction 的 MAPE? trainControl是执行还是忽略?

如何重写代码以使用rfe 进行递归特征消除,然后在rfe 中使用trainControl 调整超参数mtry,同时计算额外的错误度量(MAPE)?

【问题讨论】:

    标签: r random-forest r-caret


    【解决方案1】:

    trainControl 被忽略,因为它的描述

    控制训练函数

    的计算细微差别

    建议。要使用 MAPE,您需要

    rfec$functions$summary <- mape
    

    然后

    rfe(weight ~ Time + Chick + Diet, 
        sizes = c(1:3),
        data = ChickWeight, 
        method ="rf",
        ntree = 250, 
        metric = "MAPE", # Modified
        maximize = FALSE, # Modified
        rfeControl = rfec)
    #
    # Recursive feature selection
    #
    # Outer resampling method: Cross-Validated (10 fold) 
    #
    # Resampling performance over subset size:
    #
    #  Variables   MAPE  MAPESD Selected
    #          1 0.1903 0.03190         
    #          2 0.1029 0.01727        *
    #          3 0.1326 0.02136         
    #         53 0.1303 0.02041         
    #
    # The top 2 variables (out of 2):
    #    Time, Chick.L
    

    【讨论】:

    • 如果 rfe() 函数忽略了 trainControl,则考虑删除 trControl = trc 和 tuneGrid = tunegrid。删除这些设置不会改变上述结果。
    猜你喜欢
    • 1970-01-01
    • 2016-04-26
    • 1970-01-01
    • 1970-01-01
    • 2014-03-06
    • 1970-01-01
    • 1970-01-01
    • 2014-02-01
    • 2018-05-15
    相关资源
    最近更新 更多