将一系列回归的结果顺序存储到数据框中答案

【问题标题】：Sequentially store the results of a series of regressions into a dataframe将一系列回归的结果顺序存储到数据框中
【发布时间】：2019-09-18 11:42:33
【问题描述】：

假设我想运行一系列回归，如下所示：

summary(lm(mpg ~ cyl, data = mtcars))
summary(lm(mpg ~ disp, data = mtcars))
summary(lm(mpg ~ wt, data = mtcars))

我想创建一个数据框，其中包含每个输出的估计值和标准误差，最好包含变量名称。所以最终的输出应该是这样的：

Variable  Beta  Coeff
cyl       -2.8  .32
disp      -.04  .004
wt        -5.3  .56

我认为它需要一个函数。有什么想法吗？

【问题讨论】：

stackoverflow.com/questions/31143423/…

标签： r function regression

【解决方案1】：

一个选项是遍历感兴趣的列，paste 在lm 中创建公式，tidy 输出，slice 离开第一行，select 感兴趣的列

library(broom)
library(tidyverse)
map_df(c("cyl", "disp", "wt"), ~
      lm(paste0("mpg ~ ", .x), data = mtcars) %>% 
          tidy %>% 
          slice(-1) %>% 
          select(Variable = term, Beta = estimate, Coeff = std.error))
# A tibble: 3 x 3
#  Variable    Beta   Coeff
#  <chr>      <dbl>   <dbl>
#1 cyl      -2.88   0.322  
#2 disp     -0.0412 0.00471
#3 wt       -5.34   0.559

或使用base R

t(sapply(c("cyl", "disp", "wt"), function(x) 
   summary(lm(paste0("mpg ~ ", x), data = mtcars))$coefficients[-1, 1:2]))

【讨论】：

【解决方案2】：

一种简单的方法是使用tidyverse 中的purrr 和broom 包。

library(purrr)
library(broom)
cols <- c("cyl", "disp", "wt")

map_df(cols, ~lm(reformulate(.x, "mpg"), data=mtcars) %>% tidy())
#   term        estimate std.error statistic  p.value
#   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
# 1 (Intercept)  37.9      2.07        18.3  8.37e-18
# 2 cyl          -2.88     0.322       -8.92 6.11e-10
# 3 (Intercept)  29.6      1.23        24.1  3.58e-21
# 4 disp         -0.0412   0.00471     -8.75 9.38e-10
# 5 (Intercept)  37.3      1.88        19.9  8.24e-19
# 6 wt           -5.34     0.559       -9.56 1.29e-10

这会为您提供一些额外的信息，但如果您愿意，您可以使用dplyr 轻松将其过滤掉。

【讨论】：

这太棒了。好的，我会把你推到这里。如果我的自变量名称是按顺序标记的，例如“Var_1”、“Var_2”、“Var_3”，该怎么办？有没有办法在不单独输入每个名称的情况下执行相同的过程？
您可以根据需要构建 cols 向量。你的例子就是cols <- paste0("Var_", 1:3)
又好又简单！谢谢。
最后一个（可能很简单）跟进，因为我以前从未处理过这种代码：我将在这个等式中的哪里放置协变量？我试过：map_df(cols, ~lm(reformulate(.x + drat, "mpg"), data=mtcars) %>% tidy())
好吧，我建议您首先查看?reformulate 帮助页面。您需要传入列名的字符向量。对于map_df，.x 将是来自cols 的名称之一。您使用c() 向向量添加值，而不是+。