【问题标题】:Linear SVM and extracting the weights线性 SVM 和提取权重
【发布时间】:2019-10-24 04:53:30
【问题描述】:

我正在使用 iris 数据集在 R 中练习 SVM,我想从我的模型中获取特征权重/系数,但我认为我可能误解了某些内容,因为我的输出为我提供了 32 个支持向量。我假设我会得到四个,因为我有四个变量正在被分析。我知道在使用 svm() 函数时有办法做到这一点,但我正在尝试使用 caret 中的 train() 函数来生成我的 SVM。

library(caret)

# Define fitControl
fitControl <- trainControl(## 5-fold CV
              method = "cv",
              number = 5,
              classProbs = TRUE,
              summaryFunction = twoClassSummary )

# Define Tune
grid<-expand.grid(C=c(2^-5,2^-3,2^-1))

########## 
df<-iris head(df)
df<-df[df$Species!='setosa',]
df$Species<-as.character(df$Species)
df$Species<-as.factor(df$Species)

# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
                 y=df$Species,
                 method = "svmLinear", 
                 trControl = fitControl,
                 preProc = c("center","scale"),
                 metric="ROC",
                 tuneGrid=grid )
svmFit1

我以为只是svmFit1$finalModel@coef,但我认为应该得到 4 个向量时得到了 32 个向量。这是为什么呢?

【问题讨论】:

    标签: r svm r-caret


    【解决方案1】:

    所以coef 不是支持向量的权重W。这是docsksvm类的相关部分:

    coef对应的系数乘以训练标签。

    要获得所需的内容,您需要执行以下操作:

    coefs <- svmFit1$finalModel@coef[[1]]
    mat <- svmFit1$finalModel@xmatrix[[1]]
    
    coefs %*% mat
    

    请参阅下面的可重现示例。

    library(caret)
    #> Loading required package: lattice
    #> Loading required package: ggplot2
    #> Warning: package 'ggplot2' was built under R version 3.5.2
    
    # Define fitControl
    fitControl <- trainControl(
      method = "cv",
      number = 5,
      classProbs = TRUE,
      summaryFunction = twoClassSummary
    )
    
    # Define Tune
    grid <- expand.grid(C = c(2^-5, 2^-3, 2^-1))
    
    ########## 
    df <- iris 
    
    df<-df[df$Species != 'setosa', ]
    df$Species <- as.character(df$Species)
    df$Species <- as.factor(df$Species)
    
    # set random seed and run the model
    set.seed(321)
    svmFit1 <- train(x = df[-5],
                     y=df$Species,
                     method = "svmLinear", 
                     trControl = fitControl,
                     preProc = c("center","scale"),
                     metric="ROC",
                     tuneGrid=grid )
    
    coefs <- svmFit1$finalModel@coef[[1]]
    mat <- svmFit1$finalModel@xmatrix[[1]]
    
    coefs %*% mat
    #>      Sepal.Length Sepal.Width Petal.Length Petal.Width
    #> [1,]   -0.1338791  -0.2726322    0.9497457    1.027411
    

    reprex package (v0.2.1.9000) 于 2019 年 6 月 11 日创建

    来源

    1. https://www.researchgate.net/post/How_can_I_find_the_w_coefficients_of_SVM

    2. http://r.789695.n4.nabble.com/SVM-coefficients-td903591.html

    3. https://stackoverflow.com/a/1901200/6637133

    【讨论】:

      【解决方案2】:

      随着越来越多的人开始从 Caret 迁移到 Tidymodels,我想我会在 2020 年 8 月为 Tidymodels 提供上述解决方案的一个版本,因为到目前为止我没有看到很多关于这个的讨论,而且做起来也不是那么简单。

      在此处概述主要步骤,但请查看最后的链接以详细了解这样做的原因。

      1.获取最终模型

      set.seed(2020)
      
      # Assuming kernlab linear SVM
      
      # Grid Search Parameters
      tune_rs <- tune_grid(
        model_wf,
        train_folds,
        grid = param_grid,
        metrics = classification_measure,
        control = control_grid(save_pred = TRUE)
      )
      
      # Finalise workflow with the parameters for best accuracy
      best_accuracy <- select_best(tune_rs, "accuracy")
      
      svm_wf_final <- finalize_workflow(
        model_wf,
        best_accuracy
      )
      
      # Fit on your final model on all available data at the end of experiment
      final_model <- fit(svm_wf_final, data)
      # fit takes a model spec and executes the model fit routine (Parsnip)
        # model_spec, formula and data to fit upon
      

      2。提取 KSVM 对象,提取所需信息,计算变量重要性

      ksvm_obj <- pull_workflow_fit(final_model)$fit
      # Pull_workflow_fit returns the parsnip model fit object
      # $fit returns the object produced by the fitting fn (which is what we need! and is dependent on the engine)
      
      coefs <- ksvm_obj@coef[[1]]
      # first bit of info we need are the coefficients from the linear fit
      
      mat <- ksvm_obj@xmatrix[[1]]
      # xmatrix that we need to matrix multiply against
      
      var_impt <- coefs %*% mat
      # var importance
      

      参考:

      1. 使用插入符号提取支持向量的权重:Linear SVM and extracting the weights

      2. 变量重要性(本文最后一节):http://www.rebeccabarter.com/blog/2020-03-25_machine_learning/#finalize-the-workflow

      【讨论】:

        猜你喜欢
        • 2012-03-16
        • 2012-06-19
        • 1970-01-01
        • 2010-12-26
        • 2015-11-03
        • 2014-07-10
        • 1970-01-01
        • 2017-11-20
        • 2016-01-05
        相关资源
        最近更新 更多