【问题标题】:Excel statisticals: How to calculate p-value of a 2x2 contingency table?Excel 统计:如何计算 2x2 列联表的 p 值?
【发布时间】:2020-07-04 15:02:22
【问题描述】:

给定数据,例如:

        A         B           C
1               Group 1     Group 2
2   Property 1     56         651
3   Property 2     97       1,380

如何直接计算 p 值(即卡方分布的“右尾”概率),而不为表格的期望值设置单独的计算?

p 值在 Excel 中通过函数ChiSq.dist.RT如果您知道表格的卡方值或通过ChiSq.Test如果您知道表格表的“预期值”。卡方的值是用期望值计算出来的,而期望值是从原始表格通过一个有点复杂的公式计算出来的,所以无论哪种方式,Excel都需要我们自己计算期望值才能得到p值,这似乎有点傻。那么,如何在不单独计算期望值的情况下获得 Excel 中的 p 值呢?

编辑:这个问题最初发布的标题是“如何使用 2-property 数组计算 Pearson 相关系数?”并询问为什么函数pearson 给出了错误的答案。好吧,答案是我将 p 值与 Pearson 相关系数混淆了,它们是不同的东西。所以我重新提出了这个问题,问我真正需要知道什么,并发布了答案。我会等一会儿再接受我自己的答案,以防其他人有更好的答案。

【问题讨论】:

    标签: excel-formula p-value chi-squared


    【解决方案1】:

    在我看来,这需要 VBA。我编写了以下 VBA 函数来计算卡方或 p 值,以及 2x2 列联表的其他两个关联度量:

    Public Function nStatAssoc_2x2(sType As String, nGrp1PropCounts As Range, nGrp2PropCounts As Range) As Single
    
    ' Return one of several measures of statistical association of a 2×2 contingency table:
    '                   Property 1      Property 2
    '       Group 1     nCount(1, 1)    nCount(1, 2)
    '       Group 2     nCount(2, 1)    nCount(2, 2)
    
    ' sType is:     to calculate:
    '   "OR"        Odds ratio
    '   "phi"       Phi coefficient
    '   "chi-sq"    Chi-squared
    '   "p"         p-value, i.e., right-tailed probability of the chi-squared distribution
    
    ' nGrp<n>PropCounts is a range of two cells containing the number of members of group n that have each of two properties.
    ' These arguments are 1-D arrays in order to allow the data to appear in non-adjacent ranges in the spreadsheet.
    
    ' References:
        ' Contingency table:        https://en.wikipedia.org/wiki/Contingency_table
        ' Measure of association:   www.britannica.com/topic/measure-of-association
        ' Odds ratio:               https://en.wikipedia.org/wiki/Odds_ratio
        '                           https://en.wikipedia.org/wiki/Effect_size#Odds_ratio
        ' Phi coefficient:          https://en.wikipedia.org/wiki/Phi_coefficient
        ' Chi-sq:                   https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Calculating_the_test-statistic
        '                           www.mathsisfun.com/data/chi-square-test.html
        '                               Shows calculation of expected values.
        ' p-value:                  https://docs.microsoft.com/en-us/office/vba/api/excel.worksheetfunction.ChiSq_Dist_RT
    
    Dim nCount(1 To 2, 1 To 2) As Integer
    Dim nSumGrp(1 To 2) As Integer, nSumProp(1 To 2) As Integer, nSumAll As Integer
    Dim nExpect(1 To 2, 1 To 2) As Single
    Dim nIndex1 As Byte, nIndex2 As Byte
    Dim nRetVal As Single
    
    ' Combine input arguments into contingency table:
    For nIndex1 = 1 To 2
        nCount(1, nIndex1) = nGrp1PropCounts(nIndex1)
        nCount(2, nIndex1) = nGrp2PropCounts(nIndex1)
      Next nIndex1
    
    ' Calculate totals of group counts, property counts, and all counts (used for phi and chi-sq):
    For nIndex1 = 1 To 2
        For nIndex2 = 1 To 2
            nSumGrp(nIndex1) = nSumGrp(nIndex1) + nCount(nIndex1, nIndex2)
            nSumProp(nIndex2) = nSumProp(nIndex2) + nCount(nIndex1, nIndex2)
          Next nIndex2
      Next nIndex1
    nSumAll = nSumGrp(1) + nSumGrp(2)
    
    If nSumAll <> nSumProp(1) + nSumProp(2) Then
        nRetVal = -2           ' Error: Sums differ.
        GoTo Finished
      End If
    
    Select Case sType
    
        ' Odds ratio
        Case "OR":
            nRetVal = (nCount(1, 1) / nCount(1, 2)) / (nCount(2, 1) / nCount(2, 2))
            If nRetVal <> (nCount(1, 1) / nCount(2, 1)) / (nCount(1, 2) / nCount(2, 2)) Then
                nRetVal = -3            ' Error: OR calculation results differ.
                GoTo Finished
              End If
    
        ' Phi coefficient
        Case "phi":
            nRetVal = ((CLng(nCount(1, 1)) * nCount(2, 2)) - (CLng(nCount(1, 2)) * nCount(2, 1))) / _
                        (CSng(nSumGrp(1)) * nSumGrp(2) * nSumProp(1) * nSumProp(2)) ^ 0.5
    
        ' Chi-squared
        Case "chi-sq", "p":     ' For "p", nRetVal is passed to the next select case statement.
            ' Calculate table of expected values:
            For nIndex1 = 1 To 2
                For nIndex2 = 1 To 2
                        ' In next line, the division is done first to prevent integer overflow,
                        '   which can happen if the multiplication is done first.
                    nExpect(nIndex1, nIndex2) = nSumGrp(nIndex1) / nSumAll * nSumProp(nIndex2)
                    If nExpect(nIndex1, nIndex2) < 5 Then
                        ' https://en.wikipedia.org/wiki/Pearson's_chi-squared_test#Assumptions
                        nRetVal = -4        ' Error: Expected value too small.
                        GoTo Finished
                      Else
                        nRetVal = nRetVal + _
                            (nCount(nIndex1, nIndex2) - nExpect(nIndex1, nIndex2)) ^ 2 / nExpect(nIndex1, nIndex2)
                      End If
                  Next nIndex2
              Next nIndex1
    
        Case Else:
            nRetVal = -1           ' Error: Invalid measure type.
            GoTo Finished
      End Select
    
    Select Case sType
        Case "OR", "phi", "chi-sq":
    
        ' p-value       ' Uses value of nRetVal passed from the previous select case statement.
        Case "p": nRetVal = WorksheetFunction.ChiSq_Dist_RT(nRetVal, 1)
      End Select
    
    Finished: nStatAssoc_2x2 = nRetVal
    
    End Function        ' nStatAssoc_2x2()
    

    该函数在 Excel 2019 中进行了测试,并为多个测试表的所有四个度量产生了正确的值。欢迎提出改进代码的批评或建议。

    如果我错了,这不需要 VBA 或出于任何其他原因有更好的方法来做到这一点,请发布一个不同的答案。正如我在问题的编辑说明中所说,我会等待一段时间,然后再接受我的答案,看看其他人是否有更好的答案。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-06-16
      • 2017-11-30
      • 1970-01-01
      • 2014-04-13
      • 2021-08-03
      • 2019-12-09
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多