Purrr（或扫帚）用于计算分组数据集的比例测试（多比例测试）答案

【问题标题】：Purrr (or broom) for computing proportional test for grouped dataset (Multiple proportions test)Purrr（或扫帚）用于计算分组数据集的比例测试（多比例测试）
【发布时间】：2020-07-18 15:55:27
【问题描述】：

假设我有一个由“年份”和“认知障碍”组成的数据框（1=是，0 = 否则）

我想比较每年的比例。因此，2000 年将是：

 df %>% 
  filter(year == 2000) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

我可以通过以下方式检查：

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)

但是，在我看来，我可以通过使用扫帚或咕噜声使这些分析变得更简单。我的目标是有一张这样的桌子：

代码如下：

df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                              2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                              2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                              2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                              2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                              2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                              2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                              2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                              2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                              2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                        1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                         "tbl", "data.frame"))

df %>% 
  count(year, cogimp)

df %>% 
  filter(year == 2006) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)
prop.test(x = 2, n = 19, p = 0.5, conf.level=0.95)

【问题讨论】：

标签： r loops dplyr tidyverse purrr

【解决方案1】：

使用扫帚包中的tidy。改编自https://stackoverflow.com/a/30015869/13157536

library(dplyr)
library(broom)

df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                              2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                              2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                              2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                              2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                              2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                              2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                              2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                              2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                              2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                        1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                         "tbl", "data.frame"))

df_test <- df %>% 
  group_by(year) %>%
  summarize(cogimp = sum(cogimp), n = n()) %>%
  group_by(year, cogimp, n) %>%
  do(fitYear = prop.test(.$cogimp, .$n, p = 0.5, conf.level = 0.95))

tidy(df_test, fitYear) %>%
  select(year, cogimp, n, p.value)
#> # A tibble: 4 x 4
#> # Groups:   year, cogimp, n [4]
#>    year cogimp     n   p.value
#>   <dbl>  <dbl> <int>     <dbl>
#> 1  2000      3    30 0.0000268
#> 2  2006      2    19 0.00132  
#> 3  2010      8    20 0.502    
#> 4  2015      3    31 0.0000163

^{由reprex package (v0.3.0) 于 2020 年 4 月 6 日创建}

【讨论】：