【问题标题】:Difference between Importance(random forest) and RandomForest$importance重要性(随机森林)和 RandomForest$importance 之间的区别
【发布时间】:2018-08-20 11:51:31
【问题描述】:

我不明白重要性函数(randomForest 包)和随机森林模型的重要性值有什么区别:

我计算了一个简单的 RF 分类模型,并尝试通过以下代码找到变量重要性:

 rf_model$importance
         0               1      MeanDecreaseAccuracy    MeanDecreaseGini
 X1  0.096886458    0.032546101    0.055488009             2472.172207
 X2  0.030985037    0.025615202    0.027530078             1338.378297
 X3  0.124302743    0.012551971    0.052402188             3091.891586

importance(rf_model)
            0            1      MeanDecreaseAccuracy    MeanDecreaseGini
 X1 159.9149603    175.6265625        242.424683          2472.172207
 X2 104.8273654    97.09338154        129.5084398         1338.378297
 X3 157.0207876    86.93847182        216.6374153         3091.891586

为什么输出的前三列之间存在差异,而 MeanDecreaseGini 相同?

【问题讨论】:

    标签: r random-forest


    【解决方案1】:

    默认情况下调用importance(rf_model) 时,度量值除以其“标准误差”。考虑这个例子:

    library(randomForest)
    set.seed(4543)
    data(mtcars)
    mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
                              keep.forest=FALSE, importance=TRUE)
    
    mtcars.rf$importance
    #output
            %IncMSE IncNodePurity
    cyl   7.3939431     162.38777
    disp 10.0468306     257.46627
    hp    7.6801388     200.22729
    drat  1.0921653      65.96165
    wt    9.7998328     250.94940
    qsec  0.6066792      38.52055
    vs    0.7048540      24.75183
    am    0.6201962      17.27180
    gear  0.4110634      16.33811
    carb  1.0549523      27.47096
    

    同上

    importance(mtcars.rf, scale = FALSE)
            %IncMSE IncNodePurity
    cyl   7.3939431     162.38777
    disp 10.0468306     257.46627
    hp    7.6801388     200.22729
    drat  1.0921653      65.96165
    wt    9.7998328     250.94940
    qsec  0.6066792      38.52055
    vs    0.7048540      24.75183
    am    0.6201962      17.27180
    gear  0.4110634      16.33811
    carb  1.0549523      27.47096
    
    default: 
    importance(mtcars.rf)
           %IncMSE IncNodePurity
    cyl  15.767986     162.38777
    disp 19.885128     257.46627
    hp   18.177916     200.22729
    drat  7.002942      65.96165
    wt   18.479239     250.94940
    qsec  5.022593      38.52055
    vs    4.427525      24.75183
    am    6.435329      17.27180
    gear  3.968845      16.33811
    carb  8.207903      27.47096
    

    最后:

    importance(mtcars.rf, scale = FALSE)[,1]/mtcars.rf$importanceSD
          cyl      disp        hp      drat        wt      qsec        vs        am      gear      carb 
    15.767986 19.885128 18.177916  7.002942 18.479239  5.022593  4.427525  6.435329  3.968845  8.207903
    

    等同于importance(mtcars.rf)[,1]

    all.equal(importance(mtcars.rf, scale = FALSE)[,1]/mtcars.rf$importanceSD,
              importance(mtcars.rf)[,1])
    #output
    TRUE
    

    【讨论】:

      猜你喜欢
      • 2017-05-03
      • 2017-12-24
      • 2016-06-24
      • 2015-07-26
      • 1970-01-01
      • 2021-05-09
      • 2016-02-23
      • 2021-08-29
      • 2017-08-29
      相关资源
      最近更新 更多