执行 Fisher 测试，将多个数据框列与相同的向量 R 进行比较答案

【问题标题】：perform Fisher test comparing multiple dataframe columns to the same vector R执行 Fisher 测试，将多个数据框列与相同的向量 R 进行比较
【发布时间】：2022-07-22 01:14:19
【问题描述】：

我有一个数据框：

frequencies <- data.frame(row.names = c("a", "b", "c")
                          ,response = c(10, 7, 4)
                          ,no_response = c(12, 12, 7))

> frequencies
  response no_response
a       10          12
b        7          12
c        4           7

我想执行 Fisher 精确检验，将每一行与该实验的观察值总和（即与整个实验观察到的频率进行比较 - 我想知道在任何 a/b/c 中观察到的频率是否数据子集与对整个数据集观察到的不同）。

要“手动”完成，我会计算每列中有多少观察结果：

total <- colSums(frequencies) %>% 
  t() %>% 
  as.data.frame() %>% 
  `rownames<-`("total")

> total
      response no_response
total       21          31

然后我运行fisher.test()（我只需要p值），将每一列与total[1,]进行比较

ap <- fisher.test(rbind(total[1,], frequencies[1,]))$p.value
bp <- fisher.test(rbind(total[1,], frequencies[2,]))$p.value

等等。

必须有一个更整洁的方法。在最终输出中，我希望在 frequencies 数据框中有一列包含 p 值，如下所示：

  response no_response  pval
a       10          12   0.8
b        7          12     1
c        4           7     1

我添加了一个purrr 标签，因为我觉得我应该在这里使用map，但我不知道该怎么做。

【问题讨论】：

标签： r dataframe dplyr purrr chi-squared

【解决方案1】：

您可以使用 dplyr 尝试类似这样的简单操作：

library(dplyr)

total <- frequencies %>%
  summarise(across(everything(), sum))

frequencies %>%
  rowwise() %>%
  mutate(pval = stats::fisher.test(rbind(total, c(response, no_response)))$p.value) %>%
  ungroup()

【讨论】：

【解决方案2】：

基础：

使用 for 循环 ::

frequencies$p.value<-0
for(i in 1:nrow(frequencies)){
  frequencies$p.value[i]<- fisher.test(rbind(total[1,], frequencies[i,1:2]))$p.value
}

或使用 apply::

rbindtest <- function(x) {
  fisher.test(rbind(total[1,], x))$p.value
}
frequencies$p.value<-apply(frequencies, 1, rbindtest)

【讨论】：