【问题标题】:Difference between excel and scipy cumulative binomial distribution p values?excel和scipy累积二项分布p值之间的差异?
【发布时间】:2017-03-07 21:23:56
【问题描述】:

我有这张表(NumSucc = 成功次数,NumberTrials = 试验次数,Prob 是成功的概率):

Gene    NumSucc NumTrials   Prob
Gene1   16       26        0.9548
Gene2   16       26        0.9548
Gene3   12       21        0.9548
Gene4   17       27        0.9548
Gene5   17       27        0.9548
Gene6   17       27        0.9548
Gene7   8        15        0.9548
Gene8   10       17        0.9548

我想要每行的累积二项式分布 P 值。当我将这个精确的表格放入 excel 列 A-D 中,然后在 E 列中输入函数(例如,对于第 2 行):

=BINOMDIST(B2,C2,D2,1)

输出表如下所示:

Gene    NumSucc NumTrials   Prob    Binomial
Gene1   16  26  0.9548  9.68009E-08
Gene2   16  26  0.9548  9.68009E-08
Gene3   12  21  0.9548  1.40794E-07
Gene4   17  27  0.9548  1.47463E-07
Gene5   17  27  0.9548  1.47463E-07
Gene6   17  27  0.9548  1.47463E-07
Gene7   8   15  0.9548  1.79741E-06
Gene8   10  17  0.9548  5.01334E-06

或者,当我使用以下代码将这个确切的表放入 Scipy 时:

import glob
import os
import scipy
from scipy.stats.distributions import binom
import sys

def WriteBinomial(InputFile,output):
    open_input_file  = open(InputFile, 'r').readlines()[1:]
    for line in open_input_file:
        line = line.strip().split()
        GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
        print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ-1, num_trials, prob)))


WriteBinomial(sys.argv[1],sys.argv[2])

输出是:

GeneName    NumSucc NumTrials   Prob    Binomial
Gene1   16  26  0.9548  6.59829603211e-09
Gene2   16  26  0.9548  6.59829603211e-09
Gene3   12  21  0.9548  7.92014917046e-09
Gene4   17  27  0.9548  1.06754559723e-08
Gene5   17  27  0.9548  1.06754559723e-08
Gene6   17  27  0.9548  1.06754559723e-08
Gene7   8   15  0.9548  8.41770305586e-08
Gene8   10  17  0.9548  2.93060582331e-07

有谁知道为什么这两种方法的结果不一样?

【问题讨论】:

    标签: python excel scipy binomial-cdf


    【解决方案1】:

    您的 Python 代码有“num_succ-1”,而 Excel 公式中没有“B2-1”。

    Python -> “binom.cdf(num_succ-1, num_trials, prob)” Excel -> "=BINOMDIST(B2,C2,D2,1)"

    下面的代码应该产生与 excel 相同的输出。

    import glob
    import os
    import scipy
    from scipy.stats.distributions import binom
    import sys
    
    def WriteBinomial(InputFile,output):
        open_input_file  = open(InputFile, 'r').readlines()[1:]
        for line in open_input_file:
            line = line.strip().split()
            GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
            print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ, num_trials, prob)))
    
    
    WriteBinomial(sys.argv[1],sys.argv[2])
    

    【讨论】:

      猜你喜欢
      • 2021-12-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-05-15
      • 1970-01-01
      • 2019-06-16
      • 2022-10-02
      • 2020-04-07
      相关资源
      最近更新 更多