如何编写一个函数来使用 lapply 或 purrr 来 broom::tidy （polr）模型输出列表？答案

【问题标题】：How to write a function to use lapply or purrr to broom::tidy a list of (polr) model outputs?如何编写一个函数来使用 lapply 或 purrr 来 broom::tidy （polr）模型输出列表？
【发布时间】：2021-05-19 02:51:04
【问题描述】：

我正在运行具有不同变量等的有序 logit 模型列表。我想将输出转换为整洁的 tibble 以在 ggplot 等中使用。（我还想保存“常规模型输出”，所以我想分开做。）

我想以自动化方式执行此操作，使用 purrr 或 lapply 等，以便能够首先“运行所有模型”（自动化这是稍后的另一个问题）然后“整理所有模型”，后者大概会生成一个 tibbles 列表。

我尝试了以下方法，但它抛出：Error: No tidy method recognized for this list.

clean_model <- function(polr_results) {
  lapply(polr_results,  
    broom::tidy(polr_results, conf.int = TRUE, exponentiate = TRUE) %>%
      filter(coef.type=="coefficient")  %>% 
      dplyr::arrange(-str_detect(term, 'd2sd'))
      )
}

mtcars_m1 <- mtcars %>% polr(as.factor(cyl) ~ hp , data = ., Hess = TRUE) 
mtcars_m2 <- mtcars %>% polr(as.factor(cyl) ~ hp + qsec , data = ., Hess = TRUE) 

clean_model(c(mtcars_m1, mtcars_m2))

【问题讨论】：

标签： r dplyr lapply purrr broom

【解决方案1】：

这样的？

library(broom)
library(tidyverse)

clean_model <- function(polr_results) {
  lapply(polr_results,  function(x) {
    broom::tidy(x, conf.int = TRUE, exponentiate = TRUE) %>%
      filter(coef.type=="coefficient")
  })
}

clean_model(list(mtcars_m1, mtcars_m2))

#[[1]]
# A tibble: 1 x 7
#  term  estimate std.error statistic conf.low conf.high coef.type  
#  <chr>    <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>      
#1 hp        1.12    0.0399      2.90     1.06      1.26 coefficient

#[[2]]
# A tibble: 2 x 7
#  term  estimate std.error statistic conf.low conf.high coef.type  
#  <chr>    <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>      
#1 hp        1.13    0.0452     2.72     1.06       1.29 coefficient
#2 qsec      1.18    0.369      0.448    0.538      2.51 coefficient

【讨论】：

成功了，再次感谢@ronakshah。我的错误是 1. 不理解 lapply 需要定义的 function 2. 传递字符向量而不是列表

【解决方案2】：

另一种方法是使用 purrr，将您想要的所有不同公式放入数据框列表列中：

library(MASS)
library(tidyverse)
library(broom)

formula_dfs <- tibble(formula_id = 1:2,
                      formula = c(as.formula(as.factor(cyl) ~ hp),
                                  as.formula(as.factor(cyl) ~ hp + qsec))) 

formula_dfs
#> # A tibble: 2 x 2
#>   formula_id formula  
#>        <int> <list>   
#> 1          1 <formula>
#> 2          2 <formula>

formula_dfs %>%
  mutate(polr_fit  = map(formula, polr, data = mtcars, Hess = TRUE),
         polr_coef = map(polr_fit, tidy, conf.int = TRUE, exponentiate = TRUE)) %>%
  unnest(polr_coef) %>%
  filter(coef.type=="coefficient")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> # A tibble: 3 x 10
#>   formula_id formula   polr_fit term  estimate std.error statistic conf.low
#>        <int> <list>    <list>   <chr>    <dbl>     <dbl>     <dbl>    <dbl>
#> 1          1 <formula> <polr>   hp        1.12    0.0399     2.90     1.06 
#> 2          2 <formula> <polr>   hp        1.13    0.0452     2.72     1.06 
#> 3          2 <formula> <polr>   qsec      1.18    0.369      0.448    0.538
#> # … with 2 more variables: conf.high <dbl>, coef.type <chr>

^{由reprex package (v2.0.0) 于 2021-05-24 创建}

polr_fit 列中仍然存在您的常规模型输出。

【讨论】：