如何遍历数据框并在 R 中运行回归？答案

【问题标题】：How to loop through dataframe and run regression in R?如何遍历数据框并在 R 中运行回归？
【发布时间】：2021-11-01 05:00:01
【问题描述】：

我正在尝试对 R 中的数据框运行多元回归，但在思考如何迭代每列中的每个字符串时遇到了很多麻烦。

我有以下数据框：

Category      Zone      Season       P1   P2    P3     Value     
-------------------------------------------------------------
   Blue         D1      Winter       1     4     4        55         
   Blue         D1      Winter       3     5     3        23         
   Blue         D1      Winter       5     3     1        25           
   ...          
   Blue         D1      Spring       3     3     2        32
   Blue         D1      Spring       2     2     3        23         
   Blue         D1      Spring       5     4     5        53 
   ...

我想为每个类别、每个区域、每个季节创建一个 for lop，将 P1、P2 和 P3（P 表示“参数”）作为值上的自变量运行多元回归。请注意，在此分析中当然还有更多行，其中对于 Category、Zone 和 Season 的每个组合会有很多很多的参数值组合。

这可以在 R 中轻松完成吗？我觉得这将是使用 .lapply() 的几行代码，但我仍然对如何实际组织这个过程感到困惑。

我将不胜感激有关此问题的任何指导！谢谢！

【问题讨论】：

这是我喜欢的一种方法：cran.r-project.org/web/packages/broom/vignettes/…

标签： r dataframe loops regression

【解决方案1】：

这是我在上面评论过的链接中描述的过程的应用程序。

library(tidyverse); library(broom)
df1 %>% 
   nest(data = c(P1:Value)) %>%
   mutate(fit = map(data, ~lm(Value ~ ., data = .x)),
          tidied = map(fit, tidy)) %>%
   unnest(tidied)

结果

# A tibble: 8 x 10
  Category Zone  Season data             fit    term        estimate std.error statistic p.value
  <chr>    <chr> <chr>  <list>           <list> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 Blue     D1    Winter <tibble [3 × 4]> <lm>   (Intercept)   111.         NaN       NaN     NaN
2 Blue     D1    Winter <tibble [3 × 4]> <lm>   P1            -10.3        NaN       NaN     NaN
3 Blue     D1    Winter <tibble [3 × 4]> <lm>   P2            -11.3        NaN       NaN     NaN
4 Blue     D1    Winter <tibble [3 × 4]> <lm>   P3             NA           NA        NA      NA
5 Blue     D1    Spring <tibble [3 × 4]> <lm>   (Intercept)     5.00       NaN       NaN     NaN
6 Blue     D1    Spring <tibble [3 × 4]> <lm>   P1             12.0        NaN       NaN     NaN
7 Blue     D1    Spring <tibble [3 × 4]> <lm>   P2             -3.00       NaN       NaN     NaN
8 Blue     D1    Spring <tibble [3 × 4]> <lm>   P3             NA           NA        NA      NA

数据

df1 <- data.frame(
      stringsAsFactors = FALSE,
              Category = c("Blue", "Blue", "Blue", "Blue", "Blue", "Blue"),
                  Zone = c("D1", "D1", "D1", "D1", "D1", "D1"),
                Season = c("Winter","Winter","Winter",
                           "Spring","Spring","Spring"),
                    P1 = c(1L, 3L, 5L, 3L, 2L, 5L),
                    P2 = c(4L, 5L, 3L, 3L, 2L, 4L),
                    P3 = c(4L, 3L, 1L, 2L, 3L, 5L),
                 Value = c(55L, 23L, 25L, 32L, 23L, 53L)
    )

【讨论】：

感谢您的建议，我肯定需要进一步了解 tidyverse。但是，一旦我按照您的说明生成“df1”，如何执行回归并获得类别、区域和季节的每个组合的摘要？
df1 只是我将您提供的数据表达为可以在其他计算机上运行的代码。我的答案的“结果”部分显示了通过在顶部运行代码产生的回归摘要。例如，前四行显示了 Blue/D1/Winter 的回归，以及 Intercept、P1、P2 和 P3 的估计系数。如果不是这样，你能描述更多你正在寻找的东西吗？