使用 ggplot2/plotly 绘制“楼梯”答案

【问题标题】：Plotting "staircases" using ggplot2/plotly使用 ggplot2/plotly 绘制“楼梯”
【发布时间】：2021-04-03 08:50:34
【问题描述】：

我正在尝试在此处学习本教程：https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/（页面底部）。

我稍微修改了本教程的代码，并绘制了对应于 3 个观察值的“楼梯”（即“生存函数”，在下图中的“红色”、“蓝色”、“绿色”）数据：

 library(survival)
    library(dplyr)
    library(ranger)
    library(data.table)
library(ggplot2)
library(plotly)
    
a = na.omit(lung)
a$ID <- seq_along(a[,1])

r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = a, mtry = 4, 
importance = "permutation", splitrule = "extratrees", verbose = TRUE)

death_times <- r_fit$unique.death.times
surv_prob  <-data.frame(r_fit$survival)
avg_prob <- sapply(surv_prob, mean)

plot(r_fit$unique.death.times, r_fit$survival[1,], type = "l", ylim = c(0,1), col = "red", xlab = "Days", ylab = "survival", main = "Survival Curves")

new = a[1:3,]

pred <- predict(r_fit, new, type = 'response')$survival
pred <- data.table(pred)
colnames(pred) <- as.character(r_fit$unique.death.times)

plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")

lines(r_fit$unique.death.times, r_fit$survival[2,], type = "l", col = "green")

lines(r_fit$unique.death.times, r_fit$survival[3,], type = "l", col = "blue")

从这里开始，我想让上面的情节“互动”。我想这样当您将鼠标移到其中一条曲线上时：

属于该曲线的“属性”（来自对象“a”）悬停（例如 ID、年龄、性别、ph.ecog 等）
在来自 1) 的同一个“悬停框”中，还显示鼠标悬停在每个位置（对于给定曲线）的 x 坐标（r_fit$unique）和 y 坐标（来自“pred”）

我的计划是先将“grob”对象转换为“ggplot”对象，然后将“ggplot”对象转换为“plotly”对象：

 grob= plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")
basic_plot = ggpubr::as_ggplot(grob)

但是当我尝试检查“basic_plot”时，它显示为“NULL”。

 ggplot(f)
Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class gg/ggplot

如果这行得通，我最终会将 ggplot 对象转换为 plotly：

plotly_plot = ggplotly(final_plot)

如何制作这个互动情节？

我正在尝试实现与此类似的目标：https://plotly.com/python/v3/ipython-notebooks/survival-analysis-r-vs-python/（靠近页面底部，标题为“不同肿瘤 DNA 谱的寿命”）

（请注意：我正在使用没有 USB 端口或 Internet 连接的计算机，只有 R 和一些预安装的库...我没有“ggplotify”或“survminer”）

【问题讨论】：

base 绘图不能像ggplot 这样的对象工作。你可能需要as.grobcran.r-project.org/web/packages/ggplotify/vignettes/…。或者您是否尝试过在ggplot 或plotly 开始绘制情节？
不幸的是，我的工作电脑上没有 ggplotify（没有互联网，没有 USB 端口）
啊，在这种情况下，也许在ggplot 中构建绘图并使用ggplotly（或直接在plotly）进行转换。我无法通过手机查看当前信息，但我怀疑此问题已在 stackoverflow.com/a/29583945/10142537 进行了解释。也许grob=plot() 返回NULL？
好的，这就是问题所在，我现在已经添加了答案。我使用的ggplot 代码是一个基本示例，如果您愿意，我可以改进。它更适合 ggplot 语法，让数据位于单个数据帧（长数据）中，并使用一个 geom_line 调用而不是 3 个！
是否可以更改答案，以便： p

标签： r ggplot2 plot plotly data-visualization

【解决方案1】：

问题在于，当您在 base graphics draw directly on a device 中绘制绘图时。您的代码行 grob= plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red") 创建了一个 NULL 对象（与返回绘图对象的 ggplot 不同）。

您可以直接在ggplot 中制作绘图（有几种方法可以做到这一点，但我在下面做了一个简单的示例）并使用ggplotly 进行转换：

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])

fig_dat_long <- fig_dat %>% pivot_longer(-time, names_to = "pred_fit", values_to = "pred_fit_values")

gg_p <- ggplot(fig_dat_long, aes(x = time, y = pred_fit_values, colour = pred_fit)) +
  geom_line()

ggplotly(gg_p)

或者，您也可以直接在plotly 中绘图：

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 <- r_fit$survival[2,],
                      fit_2 <- r_fit$survival[3,])


fig <- plot_ly(fig_dat, x = ~time, y = ~pred_1, name = 'pred1', type = 'scatter', mode = 'lines')
fig <- fig %>% add_trace(y = ~fit_1, name = 'fit 1', mode = 'lines') 
fig <- fig %>% add_trace(y = ~fit_2, name = 'fit 2', mode = 'lines')

圣诞快乐 :)

更新：

## make dataframe of variables to plot:
fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])

# to include the variables from 'a' we need to put them in the same dataframe for plotting
# Trouble is they are different lengths the predicted data are a little shorter
dim(fig_dat)
dim(a)
# We can join the two with inner join: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right
fig_dat_join <- inner_join(fig_dat, a, by = "time")
dim(fig_dat_join)
# now they are equal dimensions and joined together but we have a slight issue with duplicate values:
sort(a$time) # we can see here that time 53 appears twice for example
a$time[duplicated(a$time)] # this tells us which values in time are duplicated
sort(death_times) # some
death_times[duplicated(death_times)] #none
# because of the duplicates some combinations are returned: see rows 9 and 10
fig_dat_join 

# I'm not familiar with the analysis so I'm not sure what the correct way in this case is to remove the duplicates in 'a' so that the dimentions of 'a' match the output of 'r-fit'
# You might need to look into that but it might not make much difference to the visualisation

# I've not used plotly a great deal so there is probably a better way of doing this but I've done my best and included the links as comments: https://plotly-r.com/overview.html
# labels: https://plotly.com/r/figure-labels/
x_labs <- list(
  title = "Time")

y_labs <- list(
  title = "y axis")

# T include extra info in hovertext: I https://stackoverflow.com/questions/49901771/how-to-set-different-text-and-hoverinfo-text

p1 <- plot_ly(data = fig_dat_join,
              x = ~time,
              # text = ~n,
              # textposition = "auto",
              # hoverinfo = "text",
              hovertext = paste("Time :", fig_dat_join$time,
                                "<br> Sex :", fig_dat_join$sex,
                                "<br> Inst :", fig_dat_join$inst,
                                "<br> ID :", fig_dat_join$ID,
                                "<br> Age :", fig_dat_join$age
                                )) %>% 
  add_trace(y = ~pred_1,
            type = 'scatter',
            name = 'Predictor 1',
            mode = 'lines') %>% 
  add_trace( y = ~fit_1,
            type = 'scatter',
            name = 'Fit 1',
            mode = 'lines') %>% 
  add_trace( y = ~fit_2,
             type = 'scatter',
             name = 'Fit 2',
             mode = 'lines') %>% 
  layout(xaxis = x_labs, yaxis = y_labs)

p1

更新 2：

我正在使用上面的left_join() 使数据框a 与unique.death.times 匹配。如果您不需要，我们可以将hovertext 代码移动到每个add_trace 中？

fig_dat <- data.frame(time = r_fit$unique.death.times,
                      pred_1 = t(pred[1,]),
                      fit_1 = r_fit$survival[2,],
                      fit_2 = r_fit$survival[3,])


p2 <- plot_ly(data = fig_dat,
              x = ~time,
              # text = ~n,
              # textposition = "auto",
              hoverinfo = "text"
) %>% 
  add_trace(y = ~pred_1,
            type = 'scatter',
            name = 'Predictor 1',
            mode = 'lines',
            hovertext = paste("Time :", fig_dat$time,
                              "<br> y axis :", fig_dat$pred_1,
                              "<br> Sex :", a$sex[1],
                              "<br> Inst :", a$inst[1],
                              "<br> ID :", a$ID[1],
                              "<br> Age :", a$age[1]
            )) %>% 
  add_trace( y = ~fit_1,
             type = 'scatter',
             name = 'Fit 1',
             mode = 'lines',
             hovertext = paste("Time :", fig_dat$time,
                               "<br> y axis :", fig_dat$fit_1,
                               "<br> Sex :", a$sex[2],
                               "<br> Inst :", a$inst[2],
                               "<br> ID :", a$ID[2],
                               "<br> Age :", a$age[2]
             )) %>% 
  add_trace( y = ~fit_2,
             type = 'scatter',
             name = 'Fit 2',
             mode = 'lines',
             hovertext = paste("Time :", fig_dat$time,
                               "<br> y axis :", fig_dat$fit_2,
                               "<br> Sex :", a$sex[3],
                               "<br> Inst :", a$inst[3],
                               "<br> ID :", a$ID[3],
                               "<br> Age :", a$age[3]
             )) %>% 
  layout(xaxis = x_labs, yaxis = y_labs)

p2

【讨论】：

圣诞快乐！非常感谢 - 这是我能要求的最好的圣诞礼物！只有 2 个问题：1）在悬停文本中，如何将“r_fit$survival”的标题更改为“y 轴”？ 2）对于悬停文本，您是否还可以添加来自“a”对象的信息（例如年龄、性别、id、ph.ecog 等）？谢谢，圣诞快乐！
是否可以更改答案，以便： p
我更新了一个适用于情节的答案。刚刚看到你对 ggplot 的评论。我会看看我是否也能做到这一点。
好的，我已经更改了上面的ggplot代码。 ggplot 喜欢长数据，但在转换为 ggplotly() 后，我不确定如何调整悬停文本...也许可以使用 plotly 代码 XD
啊，好吧，我没有得到那部分。我以为每次只有一个值。在那种情况下，我认为我们可以忘记join。新的更新是您需要的吗？每个add_trace 引用一个预测变量或现在拟合，并且还引用数据帧a 的第一个第二个或第三个索引。您可能需要仔细检查一些值，看看它们是否符合您的预期！