【问题标题】:Plot multiple linear regressions from tibble of intercepts and slopes using ggplot使用 ggplot 从截距和斜率的 tibble 绘制多个线性回归
【发布时间】:2020-09-23 18:14:33
【问题描述】:

我有一个fit 的截距alpha 和斜率beta 来自多个线性回归。在下面输入数据。

我想在同一个绘图窗口中取幂并绘制每个回归,这样y = exp(alpha + beta * x)

这是我为弄清楚如何做而制作的“测试”图:

ggplot() +
  stat_function(fun = ~ for (i in 1:nrow(fit)) {
    exp(fit$alpha[i] + fit$beta[i] * seq(0, 10, .01))
  }) +
  theme_classic() +
  xlim(0, 10) +
  ylim(0, 1000)

但它不起作用,我收到以下我不确定理解的警告:

Computation failed in `stat_function()`:
Elements must equal the number of rows or 1 

另外,是否有一个 ggplot 函数可以直接迭代每个回归,这样我就不必使用循环?我知道geom_abline() 可以通过截距和斜率做到这一点,但我无法对回归求幂。

数据:

structure(list(alpha = c(6.4860289555265, 6.27396167268318, 
6.32039803132685, 6.30814751731013, 5.57998066302655, 6.48871720571395, 
6.33967399748598, 6.48688731183521, 6.26045779265403, 6.2953943578198, 
6.20448822286806, 6.50154201141595, 6.1774295664319, 6.02222511089118, 
6.2716610722266, 6.21255274086976, 5.79004244768028, 6.35653188128858, 
6.26422754017315, 6.11397557151798, 6.4758221837802, 6.20707829503994, 
6.11614443128677, 6.03290796195398, 6.04382957704095, 6.24508205522959, 
5.59411842610983, 6.33452203853571, 6.42799288311273, 6.21094379710094, 
5.96247571920146, 6.32340649837508, 6.00574461437739, 5.98586711865563, 
5.90996559415481, 5.85960458364359, 6.07748580916622, 6.38297427956585, 
6.30105414357071, 6.50276479896593, 6.35108145640532, 6.11115445717759, 
6.06048094442664, 6.39924383968502, 6.29705245347993, 6.132325962512, 
6.08533361080762, 6.11299308468399, 5.99317043822914, 6.64345246270652
), beta = c(-0.240706094587343, -0.118050194208012, -0.183066432959319, 
-0.155331773463964, -0.136034449469665, -0.148786968695725, -0.138424348731508, 
-0.182977715878648, -0.14492872413148, -0.0917393831564791, -0.137963572824426, 
-0.154072673769774, -0.197768747696995, -0.109498466316583, -0.134228657790289, 
-0.162007411722827, -0.120537296889171, -0.147596027060241, -0.144570831735452, 
-0.136825094924608, -0.193485685316959, -0.208054563949588, -0.138275798744531, 
-0.115652152539183, -0.0723231611644853, -0.19880444266469, -0.138168835432978, 
-0.132242987514684, -0.171978838679919, -0.164295833035347, -0.0986271579815662, 
-0.149522368532541, -0.196407247053081, -0.19111792294904, -0.132103384320777, 
-0.107138921917582, -0.109487704684017, -0.186037683605527, -0.258118158119251, 
-0.132779176452371, -0.17328572497824, -0.194029734577603, -0.116892149681328, 
-0.193838711732235, -0.15427710341968, -0.143054577800488, -0.115065744720938, 
-0.153687083514263, -0.138507868513552, -0.178604854161425)), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】:

    标签: r ggplot2


    【解决方案1】:

    利用purrr::mapgeom_function 可以这样实现:

    library(ggplot2)
    
    ggplot() +
      purrr::map(1:nrow(fit), ~ geom_function(fun = function(x) exp(fit$alpha[.x] + fit$beta[.x] * x))) +
      theme_classic() +
      xlim(0, 10) +
      ylim(0, 1000)
    

    【讨论】:

    • 太好了,谢谢!我不知道我可以使用 map()ggplot 这样的功能。很高兴知道这一点。
    【解决方案2】:

    你可以这样计算ggplot之前的x和y:

    library(dplyr)
    library(ggplot2)
    
    fit %>%
     mutate(model = row_number()) %>%
     rowwise(model, alpha, beta) %>%
     summarise(x = seq(0, 10, .01),
               y = exp(alpha + beta * x)) %>% 
     
     ggplot() +
     geom_line(aes(x = x, y = y, colour = factor(model)), show.legend = FALSE) +
     theme_classic() +
     xlim(0, 10) +
     ylim(0, 1000)
    


    编辑:

    快速更新一下,看看dplyrpurrr 方法在效率方面的区别:

    microbenchmark::microbenchmark(
    dplyr = fit %>%
     mutate(model = row_number()) %>%
     rowwise(model, alpha, beta) %>%
     summarise(x = seq(0, 10, .01),
               y = exp(alpha + beta * x), 
               .groups = "drop") %>% 
     
     ggplot() +
     geom_line(aes(x = x, y = y, colour = factor(model)), show.legend = FALSE) +
     theme_classic() +
     xlim(0, 10) +
     ylim(0, 1000),
    
    
    purr = ggplot() +
     purrr::map(1:nrow(fit), ~ geom_function(fun = function(x) exp(fit$alpha[.x] + fit$beta[.x] * x))) +
     theme_classic() +
     xlim(0, 10) +
     ylim(0, 1000)
    
    ) %>% plot()
    

    dplyr 解决方案更快。

    【讨论】:

    • 感谢您的意见。它有效,但我发现@stefan 方法更直接。
    • 好的,很公平。如果您对时间效率感兴趣,我更新了我的答案以向您展示差异。
    猜你喜欢
    • 2015-09-09
    • 2019-02-23
    • 2014-02-15
    • 2021-10-25
    • 2018-07-07
    • 2015-11-01
    • 2018-01-19
    • 2019-10-20
    • 2016-04-16
    相关资源
    最近更新 更多