使用 purrr 在一个数据集上运行多个 chisq 测试答案

【问题标题】：Run multiple chisq-tests on one dataset with purrr使用 purrr 在一个数据集上运行多个 chisq 测试
【发布时间】：2018-05-11 19:59:24
【问题描述】：

我对@987654321@ 的世界很陌生。我有以下测试数据：

A<-tibble(parasite=sample(0:1,10,rep=TRUE),L1=sample(0:1,10,rep=TRUE),
L2=sample(0:1,10,rep=TRUE),L3=sample(0:1,10,rep=TRUE), 
L4=sample(0:1,10,rep=TRUE))

看起来像：

   parasite L1 L2 L3 L4 
1         0  0  1  0  0 
2         1  0  1  1  1 
3         1  1  1  0  1 
4         0  1  1  1  0 
5         1  1  1  1  0 
...10 rows total

我想做的是运行 4 个 chisq 测试：

1.寄生虫与 L1

2.parasite vs L2

3.parasite vs L3

4.parasite vs L4

然后我想生成一个摘要 tibble，其中列出了每个表的 Y 分量（L1、L2...）、chisq 值和 pvalues（四舍五入到合理范围）。喜欢：

variable  chisq  pvalue 
L1        1.475    0.0892 
L2       18.453    0.0000E8 
L3        2.4781   0.0012 
L4        0.6785   0.2755

我已经看到使用map 做类似的事情，但我无法让它工作，但由于我正在学习，任何简洁的方法将不胜感激。

例如

map(~chisq.test(.x, data$column)) %>% 
  tibble(names = names(.), data = .) %>% 
  mutate(stats = map(data, tidy)) 
unnest(data,stats)

谁能告诉我如何做到这一点？

谢谢！

【问题讨论】：

如果您在测试数据中使用sample()，则应确保使用set.seed() 以使其可重现。否则很难确保我们得到您期望的相同价值。

标签： r purrr broom

【解决方案1】：

这里有一种方法：将数据变成长形，在分组数据帧上使用do 调用chisq.test，然后使用broom 整理输出。

library(tidyverse)

set.seed(1)
A <-tibble(parasite=sample(0:1,10,rep=TRUE),
                     L1=sample(0:1,10,rep=TRUE),
                    L2=sample(0:1,10,rep=TRUE),
                    L3=sample(0:1,10,rep=TRUE), 
                    L4=sample(0:1,10,rep=TRUE))

A %>%
    gather(key = variable, value = value, -parasite) %>%
    group_by(variable) %>%
    do(chisq.test(.$parasite, .$value) %>% broom::tidy())
#> # A tibble: 4 x 5
#> # Groups:   variable [4]
#>   variable statistic p.value parameter method                             
#>   <chr>        <dbl>   <dbl>     <int> <chr>                              
#> 1 L1        0.         1             1 Pearson's Chi-squared test         
#> 2 L2        2.93e-32   1.000         1 Pearson's Chi-squared test with Ya…
#> 3 L3        0.         1             1 Pearson's Chi-squared test         
#> 4 L4        2.34e- 1   0.628         1 Pearson's Chi-squared test with Ya…

由reprex package (v0.2.0) 于 2018 年 5 月 11 日创建。

【讨论】：

【解决方案2】：

最好将数据重新整形为长（整齐）格式，然后您可以使用nest() 分组执行测试。例如

A %>% 
  gather("variable", "measure", -parasite) %>% 
  group_by(variable)%>% 
  nest(-variable) %>% 
  mutate(stats = map(data, ~broom::tidy(chisq.test(.$parasite, .$measure)))) %>% 
  select(-data) %>% 
  unnest()

您也可以使用do()

A %>% 
  gather("variable", "measure", -parasite) %>% 
  group_by(variable) %>% 
  do(broom::tidy(chisq.test(.$parasite, .$measure)))

【讨论】：

非常感谢您的帮助！看起来我的方法有点偏离。现在对我刚刚遇到的新功能做一些功课。