从 Cox PH 模型预测概率答案

【问题标题】：Predict probability from Cox PH model从 Cox PH 模型预测概率
【发布时间】：2015-06-23 03:10:04
【问题描述】：

我正在尝试使用 cox 模型来预测经过时间（称为停止）3 后的失败概率。

bladder1 <- bladder[bladder$enum < 5, ] 
coxmodel = coxph(Surv(stop, event) ~ (rx + size + number)  + 
        cluster(id), bladder1)
range(predict(coxmodel, bladder1, type = "lp"))
range(predict(coxmodel, bladder1, type = "risk"))
range(predict(coxmodel, bladder1, type = "terms"))
range(predict(coxmodel, bladder1, type = "expected"))

但是，predict 函数的输出都不在 0-1 范围内。是否有任何函数或如何使用 lp 预测和基线风险函数来计算概率？

【问题讨论】：

summary(survfit(coxmodel,newdata)) 提供原始模型/数据而不是新数据的时间段的概率。所以 survfit 更怀疑使用新数据处理

标签： r survival-analysis cox-regression

【解决方案1】：

请阅读predict.coxph 的帮助页面。这些都不应该是概率。一组特定协变量的线性预测变量是相对于具有所有预测变量值平均值的假设（并且很可能不存在）案例的对数风险比。 “预期”最接近概率，因为它是预测的事件数量，但需要指定时间，然后除以观察开始时的风险数量。

对于predict 的帮助页面上提供的示例，您可以看到预测事件的总和接近实际数量：

> sum(predict(fit,type="expected"), na.rm=TRUE)
[1] 163

> sum(lung$status==2)
[1] 165

我怀疑您可能希望改用 survfit 函数，因为事件的概率是 1-probability ofsurvival。

?survfit.coxph

类似问题的代码出现在这里：Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R

既然您建议使用膀胱1 数据集，那么这将是时间=5 规范的代码

 summary(survfit(coxmodel), time=5)
#------------------
Call: survfit(formula = coxmodel)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5    302      26    0.928  0.0141        0.901        0.956

这将作为一个列表返回，并将生存预测作为一个名为 $surv 的列表元素：

> str(summary(survfit(coxmodel), time=5))
List of 14
 $ n       : int 340
 $ time    : num 5
 $ n.risk  : num 302
 $ n.event : num 26
 $ conf.int: num 0.95
 $ type    : chr "right"
 $ table   : Named num [1:7] 340 340 340 112 NA 51 NA
  ..- attr(*, "names")= chr [1:7] "records" "n.max" "n.start" "events" ...
 $ n.censor: num 19
 $ surv    : num 0.928
 $ std.err : num 0.0141
 $ lower   : num 0.901
 $ upper   : num 0.956
 $ cumhaz  : num 0.0744
 $ call    : language survfit(formula = coxmodel)
 - attr(*, "class")= chr "summary.survfit"
> summary(survfit(coxmodel), time=5)$surv
[1] 0.9282944

【讨论】：

感谢您的回答！无论如何要指定预期类型的后续时间？
感谢编辑！这给了我最终解决方案的正确方向。
在type = 'expected'的情况下，我们可以将每个预测除以总和（即163）并将它们视为概率吗？
我认为您需要使用特定数据集发布此类计算的示例，以便找到您想要的。您可能希望将得出的概率相加，以查看它们与完整数据集的匹配程度。