【问题标题】:Chi squared test of a distribution in PythonPython中分布的卡方检验
【发布时间】:2023-03-25 18:25:02
【问题描述】:

我对这个练习有两个疑问:

代码的第一部分完美运行。现在我需要使用卡方检验检查分布是否平坦。

我实现的代码是:

#UNIFORM RANDOM SAMPLING 

import numpy as np                #library needed for numerical calculations
import matplotlib.pyplot as plt   #library needed for plotting purposes
from scipy.stats import chisquare #function needed for chi square test

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i
m=10**3                            #number of 10**3 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(m):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 

list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(m))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(m/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure                                                    
plt.hist(list1[0:m],bins=bins1)                                       #compute and draw the histogram of x with n bins
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the y-axis 
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis 
plt.title('Uniform distribution: number of elements $10^{3}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

i=np.uintc(987654321)              #unsigned int variable i

n=10**6                            #number of 10**6 events

list1=[i]                          #list1 needed to be updated with random i

for count in range(n):             #for cycle over expected period and update i
    i=np.uintc(i*663608941)
    list1.append(i) 
    
list1=np.divide(list1,(2**32)-1)    #needed in order to normalize the list1 elements
bins1=int(np.sqrt(n))               #histogram bin numbers

[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset

#*******************************************************************************

f_exp=(n/bins1)*np.ones(bins1)      #expected frequency, expresses in array form. 
                                    #we define an array of ones of the exact size 
                                    #as the number of bins, and then just multiply 
                                    #it with n/N where n is number of elements
                                    #and N is number of bins. 
                                    #So it will look like [n/N,n/N,n/N...]

chisquareval=chisquare(hist,f_exp,axis=0)        #Calculate a one-way chi-square test.
                                                 #The chi-square test tests the null hypothesis 
                                                 #that the categorical data has the given frequencies.
                                                 #It needs: Observed frequencies in each category, Expected frequencies in each category

print("\n")
print("The result of chi squared test is:", chisquareval, "\n")

#*******************************************************************************

plt.figure()                                                          #a unique identifier for the figure
plt.hist(list1[0:n],bins=bins1)                                       #compute and draw the histogram of x with n bins 
plt.grid()                                                            #configure the grid lines
plt.xlabel('Bins',fontweight='bold')                                  #set the label for the x-axis
plt.ylabel('Frequency',fontweight='bold')                             #set the label for the y-axis   
plt.title('Uniform distribution: number of elements $10^{6}$')        #set a title for the hist
plt.show()                                                            #display all open figures

#*******************************************************************************

输出如下:

对于第一个直方图,一切正常,有效。在第二个直方图中,我们可以看到统计数据的错误值。我之前跑过那个代码,结果是103,但是我没有改代码!

  • 为什么会这样?
  • 为什么 chisquare 的显示输出看起来很糟糕?
Power_divergenceResult(statistic=32.315000000000005, pvalue=0.35302378840079285) 

statistc和pvalue可以分开打印吗?

【问题讨论】:

  • 有什么想法吗?我试图重新编写代码,但没有任何改变。
  • 如果我将 bins1 的数量更改为 100,则代码有效!为什么?

标签: python chi-squared


【解决方案1】:

我明白这个问题。代码是正确的,但如果我找不到 ddof=1000 的表,我不能说零假设是否正确。所以得到 bin=ddof=100 我可以比较卡方检验的结果,这种情况下我可以说假设 ir 以 2% 的误差被拒绝。

【讨论】:

    最近更新 更多