如何在多个变量之间执行卡方检验并创建结果数据框？答案

【问题标题】：How do I perform chi square tests between many variables and create a data frame of the results?如何在多个变量之间执行卡方检验并创建结果数据框？
【发布时间】：2023-01-03 01:45:49
【问题描述】：

总的来说，我对 R 和数据分析还是陌生的。我有一个包含两部分的数据集：

20 个问题（答案采用 5 点李克特量表格式）
8个社会人口变量
这是数据集的缩小样本版本（仅包含 20 个问题中的 3 个和 3 个社会人口统计变量）以备不时之需：
```
data.frame(Q1 = c(1, 2, 2, 1, 3, 4, 3, 5, 2, 2),
           Q2 = c(2, 3, 5, 5, 4, 5, 1, 1, 5, 3),
           Q3 = c(4, 4, 2, 3, 2, 1, 1, 1, 5, 5), 
           ageRange = c(2, 3, 1, 1, 3, 4, 4, 2, 1, 1),
           education = c(1, 1, 3, 4, 6, 5, 3, 2, 1, 4),
           maritalStatus = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 1))
```
1. 我需要应用卡方检验，将每个问题与所有社会人口统计变量相关联。这将是总共 9 个卡方结果：Q1 - ageRange，Q1 - education，Q1 - maritalStatus，Q2 - ageRange，Q2 - education，Q2 - maritalStatus，Q3 - ageRange，Q3 - education，Q3 - maritalStatus
2. 我想将卡方配对的结果排列到一个数据框或矩阵中，其中列是 3 个社会人口统计因素，行是 3 个问题。它应该看起来像这样（只需将每个 0 替换为每个行和列对的相应 p 值）：
```
data.frame(Age = c(0, 0, 0),
           Education = c(0, 0, 0), 
           Married = c(0, 0, 0), row.names = c("Q1", "Q2", "Q3")) 
```
  我尝试使用一些应用函数，但我无法让它工作。

【问题讨论】：

到目前为止你试过什么代码？

标签： r dataframe chi-squared

【解决方案1】：

我们可以做这样的事情。这相当冗长，但一开始它可能会有所帮助：

我们在这里所做的原则上是用 Q 列中的每一列和其他列创建新的数据框。对于每个 Q 我们都做同样的事情并在最后绑定它们。

broom 包中的 tidy 函数非常方便：

library(dplyr)
library(tidyr)
library(broom)

Q1 <- df %>% 
  select(-Q2, -Q3) %>% 
  pivot_longer(-Q1) %>% 
  group_by(name) %>% 
  nest(-name) %>% 
  mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q1, .$value)))) %>% 
  select(-data) %>% 
  unnest(c(stats))

Q2 <- df %>% 
  select(-Q1, -Q3) %>% 
  pivot_longer(-Q2) %>% 
  group_by(name) %>% 
  nest(-name) %>% 
  mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q2, .$value)))) %>% 
  select(-data) %>% 
  unnest(c(stats))

Q3 <- df %>% 
  select(-Q1, -Q2) %>% 
  pivot_longer(-Q3) %>% 
  group_by(name) %>% 
  nest(-name) %>% 
  mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q3, .$value)))) %>% 
  select(-data) %>% 
  unnest(c(stats))

bind_rows(Q1, Q2, Q3, .id = "Q") %>% 
mutate(ID = paste0("Q",Q), .before=1, .keep="unused")

  ID    name          statistic p.value parameter method                    
  <chr> <chr>             <dbl>   <dbl>     <int> <chr>                     
1 Q1    ageRange          15.6    0.209        12 Pearson's Chi-squared test
2 Q1    education         27.5    0.122        20 Pearson's Chi-squared test
3 Q1    maritalStatus      2.71   0.608         4 Pearson's Chi-squared test
4 Q2    ageRange          15.6    0.209        12 Pearson's Chi-squared test
5 Q2    education         20.8    0.407        20 Pearson's Chi-squared test
6 Q2    maritalStatus      2.71   0.608         4 Pearson's Chi-squared test
7 Q3    ageRange          14.6    0.265        12 Pearson's Chi-squared test
8 Q3    education         21.7    0.359        20 Pearson's Chi-squared test
9 Q3    maritalStatus      3.06   0.549         4 Pearson's Chi-squared test

【讨论】：