【问题标题】:Error when plotting multiclass ROC curve in R在 R 中绘制多类 ROC 曲线时出错
【发布时间】:2021-05-12 02:11:52
【问题描述】:

我制作了一个 SVM 预测器,它可以将样本分为三组之一——“好”、“坏”或“好”。但是,测试数据集仅包含分类为“好”或“坏”的样本。我在尝试使用multi_roc 时遇到了一个错误,我不确定解决它的最佳方法。我做的例子如下:

library(tidymodels)
library(mlbench)
library(multiROC)
data(Ionosphere)

# preprocess dataset
Ionosphere <- Ionosphere %>% select(-V1, -V2)

# split into training and test data
ion_split <- initial_split(Ionosphere, prop = 3/5)

ion_train <- training(ion_split)
ion_test <- testing(ion_split) 

# making an artificial third class in the training set for this example
ion_train[,33] <- as.character(ion_train[,33])
ion_train[1:7,33] <- "ok"
ion_train[,33] <- as.factor(ion_train[,33])

# make a recipe
iono_rec <-
  recipe(Class ~ ., data = ion_train)  %>%
  step_normalize(all_predictors()) 

# build the model and workflow
svm_mod <-
  svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
  set_mode("classification") %>%
  set_engine("kernlab")

svm_workflow <- 
      workflow() %>%
      add_recipe(iono_rec) %>%
      add_model(svm_mod)

# run model tuning
set.seed(35)
recipe_res <-
  svm_workflow %>% 
  tune_grid(
    resamples = bootstraps(ion_train, times = 2),
    metrics = metric_set(roc_auc),
    control = control_grid(verbose = TRUE, save_pred = TRUE)
  )

# chose best model, finalise workflow
best_mod <- recipe_res %>% select_best("roc_auc")
final_wf <- finalize_workflow(svm_workflow, best_mod)
final_mod <- final_wf %>% fit(ion_train)

predict_res <- predict(
        final_mod,
        ion_test,
        type = "prob")


results <- predict_res %>% 
    cbind(ion_test$Class) %>%
    dplyr::rename(
        bad_pred_svm = .pred_bad,
        good_pred_svm = .pred_good,
        ok_pred_svm = .pred_ok,
        class = `ion_test$Class`
    ) %>%
    mutate(
        bad_true = ifelse(class == "bad", 1, 0),
        good_true = ifelse(class == "good", 1, 0),
        ok_true = ifelse(class == "ok", 1, 0)
    ) %>%
dplyr::select(-class)

这会生成一个如下所示的结果数据框:

  bad_pred_svm good_pred_svm ok_pred_svm bad_true good_true ok_true
1   0.01166109    0.92349066  0.06484826        0         1       0
2   0.82937620    0.07576908  0.09485472        1         0       0
3   0.05858563    0.88043189  0.06098248        0         1       0
4   0.91602211    0.04624037  0.03773753        1         0       0
5   0.91841475    0.04407115  0.03751410        1         0       0
6   0.01014520    0.94295540  0.04689940        0         1       0

当我尝试将它放入 multi_roc 时,我得到一个错误:

multi_roc_svm <- multi_roc(results, force_diag = TRUE)

Error in approx(res_sp[[i]][[j]], res_se[[i]][[j]], all_sp, yleft = 1,  : 
  need at least two non-NA values to interpolate
In addition: Warning messages:
1: In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
  collapsing to unique 'x' values
2: In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
  collapsing to unique 'x' value

我 99% 确定这个错误是因为我的测试数据框中没有任何“ok”类样本,但我不知道如何解决这个问题。我可以手动绘制多 ROC 曲线吗?

【问题讨论】:

  • 如果您可以创建reprex,这将有助于人们了解您的问题的范围和原因并找到答案。话虽如此,您是否尝试过在这里使用yardstick::roc_curve()?它适用于多类结果。

标签: r roc tidymodels


【解决方案1】:

我不知道 multi_roc() 在哪个包中,但 tidymodels 解决方案非常简单。

如果只想从多类ROC曲线中得到ROC值,可以使用yardstick函数:

> predict_res %>% 
+     bind_cols(ion_test) %>% 
+     # or roc_curve(Class, .pred_bad)
+     roc_auc(Class, .pred_bad)
# A tibble: 1 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 roc_auc binary         0.976

【讨论】:

    猜你喜欢
    • 2019-02-27
    • 2021-03-03
    • 2019-12-03
    • 2018-12-24
    • 1970-01-01
    • 2019-11-27
    • 2022-01-13
    • 2012-07-10
    • 2017-06-19
    相关资源
    最近更新 更多