计算R中曲线下面积的问题答案

【问题标题】：Problem in calculating Area under curve in R计算R中曲线下面积的问题
【发布时间】：2022-01-23 03:24:45
【问题描述】：

我有一个包含 50 个样本的数据集，并将其分为训练数据集和测试数据集。我将 SVM 应用于训练数据集并预测了一个模型。

下面，您可以找到来自训练数据的svm 列和来自测试数据的Predicted 列。

data <- structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4", 
"Sample5", "Sample6", "Sample7", "Sample8", "Sample9", "Sample10", 
"Sample11", "Sample12", "Sample13", "Sample14", "Sample15", "Sample16", 
"Sample17", "Sample18", "Sample19", "Sample20", "Sample21", "Sample22", 
"Sample23", "Sample24", "Sample25", "Sample26", "Sample27", "Sample28", 
"Sample29", "Sample30", "Sample31", "Sample32", "Sample33", "Sample34", 
"Sample35", "Sample36", "Sample37", "Sample38", "Sample39", "Sample40", 
"Sample41", "Sample42", "Sample43", "Sample44", "Sample45", "Sample46", 
"Sample47", "Sample48", "Sample49"), svm = c("typeA", "typeA", 
"typeA", "typeB", "typeB", "typeB", "typeB", "typeB", "typeA", 
"typeB", "typeA", "typeB", "typeA", "typeB", "typeA", "typeB", 
"typeB", "typeB", "typeA", "typeA", "typeB", "typeA", "typeB", 
"typeA", "typeB", "typeA", "typeA", "typeA", "typeA", "typeA", 
"typeA", "typeB", "typeB", "typeB", "typeB", "typeB", "typeB", 
"typeB", "typeA", "typeB", "typeA", "typeB", "typeB", "typeA", 
"typeA", "typeA", "typeA", "typeA", "typeB"), Predicted = c("typeA", 
"typeA", "typeA", "typeB", "typeB", "typeB", "typeB", "typeB", 
"typeA", "typeB", "typeA", "typeA", "typeA", "typeB", "typeA", 
"typeB", "typeB", "typeB", "typeA", "typeA", "typeB", "typeA", 
"typeB", "typeA", "typeB", "typeA", "typeA", "typeA", "typeA", 
"typeA", "typeA", "typeB", "typeB", "typeB", "typeB", "typeA", 
"typeB", "typeB", "typeA", "typeA", "typeB", "typeB", "typeB", 
"typeA", "typeA", "typeA", "typeA", "typeA", "typeB")), row.names = c(NA, 
-49L), class = "data.frame")

我添加了pred2 列，如下所示：

data$pred2 <- ifelse(data$svm=="typeA", 1, 0)

我使用pROC 包来获取AUC。

library(pROC)
res.roc <- roc(data$Predicted, data$pred2)
plot.roc(res.roc, print.auc = TRUE, main="")

我看过几篇帖子，它们表明 AUC（曲线下面积）比准确度更能说明模型的性能。

我很困惑我计算 AUC 的方式是真正的 AUC 还是准确度？谁能告诉这是否正确？这足以检查模型的性能吗？

【问题讨论】：

标签： r classification svm roc

【解决方案1】：

我认为这个问题最好提交给Cross Validated，但准确度！= AUC。

这里有一篇文章描述了差异和其他一些可能更好的评估机器学习算法性能的指标：https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

不足之处在于准确性需要选择截止值，而 AUC 则不需要。

pROC 包使用trapezoid rule 来计算AUC。查看pROCH::auc函数的帮助，它有很多信息和参考。

【讨论】：

非常感谢。我可以根据上述可用数据知道如何计算 AUC 吗？
将此添加到答案中。
非常感谢！！