如何从 R 中的 survfit 中提取危害？答案

【问题标题】：How do I extract hazards from survfit in R?如何从 R 中的 survfit 中提取危害？
【发布时间】：2015-04-19 10:47:38
【问题描述】：

我有一个survfit 对象。对我的t=0:50 年感兴趣的摘要生存很容易。

summary(survfit, t=0:50)

它给出了每个 t 的生存率。

有没有办法获得每个 t 的危险（在这种情况下，每个 t=0:50 中从 t-1 到 t 的危险）？我想获得与 Kaplan Meier 曲线相关的风险的平均值和置信区间（或标准误差）。

当分布适合时，这似乎很容易做到（例如flexsurvreg 中的type="hazard"），但我不知道如何为常规的 survfit 对象执行此操作。建议？

【问题讨论】：

相关：stackoverflow.com/q/28543538/6574038

标签： r survival-analysis

【解决方案1】：

这有点棘手，因为风险是对瞬时概率的估计（这是离散数据），但basehaz 函数可能会有所帮助，但它只返回累积风险。所以你仍然需要执行一个额外的步骤。

muhaz 函数我也很幸运。从其文档中：

library(muhaz)
?muhaz
data(ovarian, package="survival")
attach(ovarian)
fit1 <- muhaz(futime, fustat)
plot(fit1)

我不确定获得 95% 置信区间的最佳方法，但自举可能是一种方法。

#Function to bootstrap hazard estimates
haz.bootstrap <- function(data,trial,min.time,max.time){
  library(data.table)
  data <- as.data.table(data)
  data <- data[sample(1:nrow(data),nrow(data),replace=T)]
  fit1 <- muhaz(data$futime, data$fustat,min.time=min.time,max.time=max.time)
  result <- data.table(est.grid=fit1$est.grid,trial,haz.est=fit1$haz.est)
  return(result)
}

#Re-run function to get 1000 estimates
haz.list <- lapply(1:1000,function(x) haz.bootstrap(data=ovarian,trial=x,min.time=0,max.time=744))
haz.table <- rbindlist(haz.list,fill=T)

#Calculate Mean,SD,upper and lower 95% confidence bands
plot.table <- haz.table[, .(Mean=mean(haz.est),SD=sd(haz.est)), by=est.grid]
plot.table[, u95 := Mean+1.96*SD]
plot.table[, l95 := Mean-1.96*SD]

#Plot graph
library(ggplot2)
p <- ggplot(data=plot.table)+geom_smooth(aes(x=est.grid,y=Mean))
p <- p+geom_smooth(aes(x=est.grid,y=u95),linetype="dashed")
p <- p+geom_smooth(aes(x=est.grid,y=l95),linetype="dashed")
p

【讨论】：

这个不错。我对data.table非常陌生。你能在 1 年内轻松地将 est.grid 作为一个序列来完成吗？此外，我正在我自己的数据中寻找 0:50 范围，但引导程序并不总是对最大时间进行采样，因此 plot.table 不会返回我需要的范围。你有什么建议吗？
我觉得我有点困惑。如果将 744 替换为 50（您要估计的上限）会发生什么？引导程序不选择每个样本中的最大值并不重要，因为所有估计的点最后都是平均的。也许如果您发布了更可重现的数据表示，我可能会更好地理解。

【解决方案2】：

作为对 Mike 的回答的补充，可以通过泊松分布而不是正态分布来模拟事件的数量。然后可以通过伽马分布计算危险率。代码将变为：

library(muhaz)
library(data.table)
library(rGammaGamma)
data(ovarian, package="survival")
attach(ovarian)
fit1 <- muhaz(futime, fustat)
plot(fit1)

#Function to bootstrap hazard estimates
haz.bootstrap <- function(data,trial,min.time,max.time){
  library(data.table)
  data <- as.data.table(data)
  data <- data[sample(1:nrow(data),nrow(data),replace=T)]
  fit1 <- muhaz(data$futime, data$fustat,min.time=min.time,max.time=max.time)
  result <- data.table(est.grid=fit1$est.grid,trial,haz.est=fit1$haz.est)
  return(result)
}

#Re-run function to get 1000 estimates
haz.list <- lapply(1:1000,function(x) haz.bootstrap(data=ovarian,trial=x,min.time=0,max.time=744))
haz.table <- rbindlist(haz.list,fill=T)

#Calculate Mean, gamma parameters, upper and lower 95% confidence bands
plot.table <- haz.table[, .(Mean=mean(haz.est),
                            Shape = gammaMME(haz.est)["shape"],
                            Scale = gammaMME(haz.est)["scale"]), by=est.grid]
plot.table[, u95 := qgamma(0.95,shape = Shape + 1, scale = Scale)]
# The + 1 is due to the discrete character of the poisson distribution.
plot.table[, l95 := qgamma(0.05,shape = Shape, scale = Scale)]

#Plot graph
ggplot(data=plot.table) + 
  geom_line(aes(x=est.grid, y=Mean),col="blue") + 
  geom_ribbon(aes(x=est.grid, y=Mean, ymin=l95, ymax=u95),alpha=0.5, fill= "lightblue")

可以看出，对危险率下限的负估计现在已经消失了。

【讨论】：

不错的答案，但请注意，双面 CI 是使用 1 - alpha/2 计算的！

【解决方案3】：

为了性能，我们可以使用更精简的引导函数。

## define custom times
t0 <- 0
t1 <- 744

## bootstrap fun
boot_fun <- function(x) {
  n <- dim(x)[1]
  x <- x[sample.int(n, n, replace=TRUE), ]
  muhaz::muhaz(x$futime, x$fustat, min.time=t0, max.time=t1)
}

# bootstrap
set.seed(42)
R <- 999
B <- replicate(R, boot_fun(ovarian))

结果可以手动计算。

## extract matrix from bootstrap
r <- `colnames<-`(t(array(unlist(B[3, ]), dim=c(101, R))), B[2, ][[1]])

## calculate result
library(matrixStats)  ## for fast matrix calculations
r <- cbind(x=as.numeric(colnames(r)), 
           y=colMeans2(r),
           shape=(colMeans2(r)/colSds(r))^2, 
           scale=colVars(r)/colMeans2(r))
r <- cbind(r[, 1:2], 
           lower=qgamma(0.025, shape=r[, 'shape'] + 1, scale=r[, 'scale']),
           upper=qgamma(0.975, shape=r[, 'shape'], scale=r[, 'scale']))

给予

head(r)
#          x            y        lower       upper
# [1,]  0.00 0.0003836816 9.359267e-05 0.001400539
# [2,]  7.44 0.0003913992 9.746868e-05 0.001387551
# [3,] 14.88 0.0003997275 1.018897e-04 0.001374005
# [4,] 22.32 0.0004087439 1.069353e-04 0.001360212
# [5,] 29.76 0.0004178464 1.123697e-04 0.001346187
# [6,] 37.20 0.0004275531 1.184685e-04 0.001332237

range(r[, 'y'])
# [1] 0.0003836816 0.0011122910

情节

matplot(r[, 1], r[, -1], type='l', lty=c(1, 2, 2), col=4, 
        xlab='Time', ylab='Hazard Rate', main='Hazard Estimates')
legend('topleft', legend=c('estimate', '95% CI'), col=4, lty=1:2, cex=.8)

数据

data(cancer, package="survival")  ## loads `ovarian` data set

【讨论】：