【问题标题】:Purrr (or broom) for computing proportional test for grouped dataset (Multiple proportions test)Purrr(或扫帚)用于计算分组数据集的比例测试(多比例测试)
【发布时间】:2020-07-18 15:55:27
【问题描述】:

假设我有一个由“年份”和“认知障碍”组成的数据框(1=是,0 = 否则)

我想比较每年的比例。因此,2000 年将是:

 df %>% 
  filter(year == 2000) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

我可以通过以下方式检查:

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)

但是,在我看来,我可以通过使用扫帚或咕噜声使这些分析变得更简单。 我的目标是有一张这样的桌子:

代码如下:

df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                              2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                              2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                              2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                              2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                              2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                              2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                              2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                              2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                              2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                        1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                         "tbl", "data.frame"))

df %>% 
  count(year, cogimp)

df %>% 
  filter(year == 2006) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)
prop.test(x = 2, n = 19, p = 0.5, conf.level=0.95)

【问题讨论】:

    标签: r loops dplyr tidyverse purrr


    【解决方案1】:

    使用扫帚包中的tidy。改编自https://stackoverflow.com/a/30015869/13157536

    library(dplyr)
    library(broom)
    
    df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                                  2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                                  2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                                  2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                                  2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                                  2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                                  2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                                  2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                                  2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                                  2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                            0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                            0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                            1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                             "tbl", "data.frame"))
    
    df_test <- df %>% 
      group_by(year) %>%
      summarize(cogimp = sum(cogimp), n = n()) %>%
      group_by(year, cogimp, n) %>%
      do(fitYear = prop.test(.$cogimp, .$n, p = 0.5, conf.level = 0.95))
    
    tidy(df_test, fitYear) %>%
      select(year, cogimp, n, p.value)
    #> # A tibble: 4 x 4
    #> # Groups:   year, cogimp, n [4]
    #>    year cogimp     n   p.value
    #>   <dbl>  <dbl> <int>     <dbl>
    #> 1  2000      3    30 0.0000268
    #> 2  2006      2    19 0.00132  
    #> 3  2010      8    20 0.502    
    #> 4  2015      3    31 0.0000163
    

    reprex package (v0.3.0) 于 2020 年 4 月 6 日创建

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-03-11
      • 2022-12-24
      • 2019-03-07
      • 1970-01-01
      • 1970-01-01
      • 2014-12-24
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多