FISHER 精确检验和 GLM 之间的 P 值差异答案

【问题标题】：P value difference between FISHER exact test and GLMFISHER 精确检验和 GLM 之间的 P 值差异
【发布时间】：2020-04-07 09:28:27
【问题描述】：

我有一个日期集 plink.raw，我正在测试制造商 CN00020133（具有三个级别 0、1、2）是否与 phenotype5 相关联。我想使用 GLM 或 Fisher 提取测试比较 0 vs 1 和 2 vs 1：

    table(plink.raw$phenotype5,plink.raw$CN00020133)
              1     0     2
        0  3559     0     7
        1 14806    54   123

使用 GLM 测试，我可以看到 0 与 1 的 p 值为 0.912894。

plink.raw$CN00020133 <- factor(plink.raw$CN00020133, levels=c("1","0","2"))
univariate=glm(phenotype5 ~ relevel(CN00020133,ref ="1"), family = binomial, data = plink.raw)
summary(univariate)

呼叫：

glm(formula = phenotype5 ~ relevel(CN00020133, ref = "1"), 
    family = binomial, data = plink.raw)

偏差残差：

    Min       1Q   Median       3Q      Max  
-2.4173   0.6564   0.6564   0.6564   0.6564

系数：

                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         1.42555    0.01867  76.361  < 2e-16 ***
relevel(CN00020133, ref = "1")0  13.14051  120.12616   0.109 0.912894    
relevel(CN00020133, ref = "1")2   1.44072    0.38902   3.703 0.000213 ***

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 18158  on 18548  degrees of freedom
Residual deviance: 18114  on 18546  degrees of freedom
AIC: 18120

Number of Fisher Scoring iterations: 13

但如果我使用 Fisher 精确检验对其进行测试，0 与 1 的 p 值为 4.618e-06。

Convictions <- matrix(c(0, 54, 3559, 14806), nrow = 2,dimnames = list(c("control", "case"),c("del/dup", "normal_copy")))
fisher.test(Convictions, alternative = "less")

Fisher's Exact Test for Count Data
data:  Convictions
p-value = 9.048e-06
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.0000000 0.2374411
sample estimates:
odds ratio 0

【问题讨论】：

标签： r glm p-value

【解决方案1】：

您有一个完全分离的情况，也称为 Hauck-Donner 效应。如果你CN00020133==0，它们的表型都是1，这使得很难估计系数的标准误差。上面有相当多的材料，例如Alexej's blog、post by Brian Ripley、Ben Bolker's notes。

如果您需要检验“1”效应的显着性，一种解决方案是使用似然比检验：

df =  rbind(
data.frame(phenotype=rep(0:1,c(3559,14806)),CN00020133="1"),
data.frame(phenotype=rep(0:1,c(0,54)),CN00020133="0")
)

anova(glm(phenotype ~ CN00020133,data=df,family=binomial),test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: phenotype

Terms added sequentially (first to last)


           Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
NULL                       18418      18082             
CN00020133  1   23.227     18417      18059 1.44e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

与 Fisher 检验相比，这为您提供了不同的 p 值，因为它是不同的分布（二项式与超几何）。但或多或少可以得出结论，使用“0”作为参考，“1”的附加效果。

在R中有一个Firth的逻辑回归的实现，你可以试试看，但我必须说我对这个不是很熟悉：

library("logistf")
logistf(phenotype ~ CN00020133,data=df,family=binomial)

【讨论】：