R生存survreg不适合答案

【问题标题】：R survival survreg not producing a good fitR生存survreg不适合
【发布时间】：2017-09-11 23:10:40
【问题描述】：

我是使用 R 的新手，我正在尝试使用生存分析来找到删失数据中的相关性。 x 数据是原恒星的包络质量。 y 数据是观察到的分子线的强度，有些值是上限。数据为：

x <- c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018)
y <- c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42)
censor <- c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1)

我正在使用 R Survival 库中的函数 survreg

modeldata<-survreg(formula=Surv(y,censor)~x, dist="exponential", control = list(maxiter=90))

结果如下：

summary(modeldata)

Call:
survreg(formula = Surv(y, censor) ~ x, dist = "exponential", 
control = list(maxiter = 90))
Value Std. Error     z     p
(Intercept) -0.114      0.568 -0.20 0.841
x            0.153      0.110  1.39 0.163

Scale fixed at 1 

Exponential distribution
Loglik(model)= -6.9   Loglik(intercept only)= -9
Chisq= 4.21 on 1 degrees of freedom, p= 0.04 
Number of Newton-Raphson Iterations: 5 
n= 11

但是，当我使用以下方法绘制数据和模型时：

plot(x,y,pch=(censor+1))
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red")

我知道了plot of x and y data; triangles are censored data

我不确定我哪里出错了。我尝试了不同的分布，但都产生了相似的结果。当我使用其他数据时也是如此，例如：

x <- c(1.14, 1.14, 1.19, 0.78, 0.43, 0.24, 0.19, 0.16, 0.17, 0.66, 0.40)

我也不确定我是否正确解释了结果。

我尝试过使用相同方法的其他示例（例如https://stats.idre.ucla.edu/r/examples/asa/r-applied-survival-analysis-ch-1/），据我所知，它运行良好。

所以我的问题是：

我是否使用了正确的函数来拟合数据？如果没有，哪个更合适？
如果它是正确的函数，为什么模型不能更紧密地拟合数据？跟剧情有关系吗？

感谢您的帮助。

【问题讨论】：

我认为说你截断了数据更准确。而且分布看起来更像对数而不是指数。

标签： r regression survival-analysis

【解决方案1】：

关系的“形状”看起来是向下凹的，所以我猜想~ log(x) 拟合可能更合适：

dfrm <- data.frame( x = c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018),
y = c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42),
censor= c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1))

modeldata<-survreg(formula=Surv(y,censor)~log(x), data=dfrm, dist="loggaussian", control = list(maxiter=90))

你的代码看起来很合适：

png(); plot(y~x,pch=(censor+1),data=dfrm)
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red"); dev.off()

modeldata
Call:
survreg(formula = Surv(y, censor) ~ log(x), data = dfrm, dist = "loggaussian", 
    control = list(maxiter = 90))

Coefficients:
(Intercept)      log(x) 
 0.02092589  0.32536509 

Scale= 0.7861798 

Loglik(model)= -6.6   Loglik(intercept only)= -8.8
    Chisq= 4.31 on 1 degrees of freedom, p= 0.038 
n= 11

【讨论】：

感谢您的回复，成功了。我确实注意到，在我的情况下，使用公式 = Surv(y, censor) ~ x 比公式 = Surv(y, censor) ~ log(x) 效果更好，我仍在试图找出原因。无论哪种情况，您的回答都解决了我的问题。