使用 R 的 glm 中的“权重”参数执行逻辑回归的 ROC 曲线答案

【问题标题】：Perform ROC curve for logistic regression that uses the "weights" argument in R's glm使用 R 的 glm 中的“权重”参数执行逻辑回归的 ROC 曲线
【发布时间】：2016-06-21 19:40:33
【问题描述】：

我目前正在运行逻辑回归，它需要在 glm 函数中使用“whights”参数，如下所示：

model <-glm(cr ~kw_url+row_number+domn*plform+score100,family=binomial,weights=weights,data=glm_data)

head(glm_data[cr>0 & cr <1]) 
   kw_url  plform row_number   domn score    cr weights score100
1:  other Desktop          0 ***  0.25 0.007407407     135       25
2:  other Desktop          0 d***  0.24 0.011494253      87       24
3:  other  Mobile          0 ***  0.14 0.001414427     707       14
4:  other  Mobile          1 ***  0.43 0.013888889     144       43
5:  other  Mobile          2 ***  0.38 0.027027027      37       38
6:  other  Mobile          1 ***  0.48 0.014285714      70       48

head(glm_data[cr>0 & cr <1,.(cr)]) #Dependant variable is a fraction!, not 0 or 1
            cr
1: 0.007407407
2: 0.011494253
3: 0.001414427
4: 0.013888889
5: 0.027027027
6: 0.014285714

我通常使用pROC 或ROCR 库来执行ROC 曲线，尽管它们要求回归的因变量为0 或1，但不是分数。

由于这个问题，我收到以下错误：

library(ROCR)
> p <- predict(bayes_model, newdata=glm_data, type="response")
> pr <- prediction(p, glm_data$cr)
Error in prediction(p, glm_data$cr) : 
  Number of classes is not equal to 2.
ROCR currently supports only evaluation of binary classification tasks

所以我的问题是：是否有一些 R 包可以产生 ROC 曲线，并支持带有加权数据的 R 的 glm 函数？

【问题讨论】：

ROC 曲线用于评估您的模型对一个类与另一个类（或多个其他类，然后您将其视为一个类）进行分类的程度。在这种情况下，您需要另一个指标/图表来评估性能，因为您没有预测类别。在这种情况下，权重无关紧要。
glm with weights and family=binomial 是一个逻辑回归模型，有 2 个 1 或 0 类，聚合只是 stile 分组中的一种有效格式，它不是连续数据的模型跨度>
我明白了。在这种情况下，您可以展开数据，例如对于 0.007407407 添加 135 个零和一个 1 以使数据可以使用，例如ROC 包。
展开数据正是我试图避免的，它的巨大:)

标签： r roc

【解决方案1】：

那就试试这个吧。它不是一个包，但应该得到 ROC。 prob 是逻辑回归的概率。如果这仍然是太多的点，那么就拿一个样本。

d <- data.frame(cr = c(1/212, 1/142, 1/15*2, 10/16, 10/3), 
                weight = c(212, 142, 15, 16, 3), 
                prob = c(1/200, 1/100, 1/35, 1/2, .7))


d$N <- (1 + d$cr) * d$weight
d$y <- d$cr * d$weight
o <- order(d$prob)
d <- d[o,]

N <- sum(d$y)
TOT <- sum(d$N)

x.plot <- cumsum(d$y) / N
y.plot <- cumsum(d$N) / (TOT - N)


plot(x.plot, y.plot, type = 'b')

【讨论】：