使用 ggplot 从截距和斜率的 tibble 绘制多个线性回归答案

【问题标题】：Plot multiple linear regressions from tibble of intercepts and slopes using ggplot使用 ggplot 从截距和斜率的 tibble 绘制多个线性回归
【发布时间】：2020-09-23 18:14:33
【问题描述】：

我有一个fit 的截距alpha 和斜率beta 来自多个线性回归。在下面输入数据。

我想在同一个绘图窗口中取幂并绘制每个回归，这样y = exp(alpha + beta * x)。

这是我为弄清楚如何做而制作的“测试”图：

ggplot() +
  stat_function(fun = ~ for (i in 1:nrow(fit)) {
    exp(fit$alpha[i] + fit$beta[i] * seq(0, 10, .01))
  }) +
  theme_classic() +
  xlim(0, 10) +
  ylim(0, 1000)

但它不起作用，我收到以下我不确定理解的警告：

Computation failed in `stat_function()`:
Elements must equal the number of rows or 1

另外，是否有一个 ggplot 函数可以直接迭代每个回归，这样我就不必使用循环？我知道geom_abline() 可以通过截距和斜率做到这一点，但我无法对回归求幂。

数据：

structure(list(alpha = c(6.4860289555265, 6.27396167268318, 
6.32039803132685, 6.30814751731013, 5.57998066302655, 6.48871720571395, 
6.33967399748598, 6.48688731183521, 6.26045779265403, 6.2953943578198, 
6.20448822286806, 6.50154201141595, 6.1774295664319, 6.02222511089118, 
6.2716610722266, 6.21255274086976, 5.79004244768028, 6.35653188128858, 
6.26422754017315, 6.11397557151798, 6.4758221837802, 6.20707829503994, 
6.11614443128677, 6.03290796195398, 6.04382957704095, 6.24508205522959, 
5.59411842610983, 6.33452203853571, 6.42799288311273, 6.21094379710094, 
5.96247571920146, 6.32340649837508, 6.00574461437739, 5.98586711865563, 
5.90996559415481, 5.85960458364359, 6.07748580916622, 6.38297427956585, 
6.30105414357071, 6.50276479896593, 6.35108145640532, 6.11115445717759, 
6.06048094442664, 6.39924383968502, 6.29705245347993, 6.132325962512, 
6.08533361080762, 6.11299308468399, 5.99317043822914, 6.64345246270652
), beta = c(-0.240706094587343, -0.118050194208012, -0.183066432959319, 
-0.155331773463964, -0.136034449469665, -0.148786968695725, -0.138424348731508, 
-0.182977715878648, -0.14492872413148, -0.0917393831564791, -0.137963572824426, 
-0.154072673769774, -0.197768747696995, -0.109498466316583, -0.134228657790289, 
-0.162007411722827, -0.120537296889171, -0.147596027060241, -0.144570831735452, 
-0.136825094924608, -0.193485685316959, -0.208054563949588, -0.138275798744531, 
-0.115652152539183, -0.0723231611644853, -0.19880444266469, -0.138168835432978, 
-0.132242987514684, -0.171978838679919, -0.164295833035347, -0.0986271579815662, 
-0.149522368532541, -0.196407247053081, -0.19111792294904, -0.132103384320777, 
-0.107138921917582, -0.109487704684017, -0.186037683605527, -0.258118158119251, 
-0.132779176452371, -0.17328572497824, -0.194029734577603, -0.116892149681328, 
-0.193838711732235, -0.15427710341968, -0.143054577800488, -0.115065744720938, 
-0.153687083514263, -0.138507868513552, -0.178604854161425)), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】：

标签： r ggplot2

【解决方案1】：

利用purrr::map 和geom_function 可以这样实现：

library(ggplot2)

ggplot() +
  purrr::map(1:nrow(fit), ~ geom_function(fun = function(x) exp(fit$alpha[.x] + fit$beta[.x] * x))) +
  theme_classic() +
  xlim(0, 10) +
  ylim(0, 1000)

【讨论】：

太好了，谢谢！我不知道我可以使用 map() 和 ggplot 这样的功能。很高兴知道这一点。

【解决方案2】：

你可以这样计算ggplot之前的x和y：

library(dplyr)
library(ggplot2)

fit %>%
 mutate(model = row_number()) %>%
 rowwise(model, alpha, beta) %>%
 summarise(x = seq(0, 10, .01),
           y = exp(alpha + beta * x)) %>% 
 
 ggplot() +
 geom_line(aes(x = x, y = y, colour = factor(model)), show.legend = FALSE) +
 theme_classic() +
 xlim(0, 10) +
 ylim(0, 1000)

编辑：

快速更新一下，看看dplyr 和purrr 方法在效率方面的区别：

microbenchmark::microbenchmark(
dplyr = fit %>%
 mutate(model = row_number()) %>%
 rowwise(model, alpha, beta) %>%
 summarise(x = seq(0, 10, .01),
           y = exp(alpha + beta * x), 
           .groups = "drop") %>% 
 
 ggplot() +
 geom_line(aes(x = x, y = y, colour = factor(model)), show.legend = FALSE) +
 theme_classic() +
 xlim(0, 10) +
 ylim(0, 1000),


purr = ggplot() +
 purrr::map(1:nrow(fit), ~ geom_function(fun = function(x) exp(fit$alpha[.x] + fit$beta[.x] * x))) +
 theme_classic() +
 xlim(0, 10) +
 ylim(0, 1000)

) %>% plot()

dplyr 解决方案更快。

【讨论】：

感谢您的意见。它有效，但我发现@stefan 方法更直接。
好的，很公平。如果您对时间效率感兴趣，我更新了我的答案以向您展示差异。