【发布时间】:2023-03-25 18:25:02
【问题描述】:
我对这个练习有两个疑问:
代码的第一部分完美运行。现在我需要使用卡方检验检查分布是否平坦。
我实现的代码是:
#UNIFORM RANDOM SAMPLING
import numpy as np #library needed for numerical calculations
import matplotlib.pyplot as plt #library needed for plotting purposes
from scipy.stats import chisquare #function needed for chi square test
#*******************************************************************************
i=np.uintc(987654321) #unsigned int variable i
m=10**3 #number of 10**3 events
list1=[i] #list1 needed to be updated with random i
for count in range(m): #for cycle over expected period and update i
i=np.uintc(i*663608941)
list1.append(i)
list1=np.divide(list1,(2**32)-1) #needed in order to normalize the list1 elements
bins1=int(np.sqrt(m)) #histogram bin numbers
[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset
#*******************************************************************************
f_exp=(m/bins1)*np.ones(bins1) #expected frequency, expresses in array form.
#we define an array of ones of the exact size
#as the number of bins, and then just multiply
#it with n/N where n is number of elements
#and N is number of bins.
#So it will look like [n/N,n/N,n/N...]
chisquareval=chisquare(hist,f_exp,axis=0) #Calculate a one-way chi-square test.
#The chi-square test tests the null hypothesis
#that the categorical data has the given frequencies.
#It needs: Observed frequencies in each category, Expected frequencies in each category
print("\n")
print("The result of chi squared test is:", chisquareval, "\n")
#*******************************************************************************
plt.figure() #a unique identifier for the figure
plt.hist(list1[0:m],bins=bins1) #compute and draw the histogram of x with n bins
plt.grid() #configure the grid lines
plt.xlabel('Bins',fontweight='bold') #set the label for the y-axis
plt.ylabel('Frequency',fontweight='bold') #set the label for the y-axis
plt.title('Uniform distribution: number of elements $10^{3}$') #set a title for the hist
plt.show() #display all open figures
#*******************************************************************************
i=np.uintc(987654321) #unsigned int variable i
n=10**6 #number of 10**6 events
list1=[i] #list1 needed to be updated with random i
for count in range(n): #for cycle over expected period and update i
i=np.uintc(i*663608941)
list1.append(i)
list1=np.divide(list1,(2**32)-1) #needed in order to normalize the list1 elements
bins1=int(np.sqrt(n)) #histogram bin numbers
[hist,bin_edges]=np.histogram(list1,bins=bins1) #compute the histogram of a dataset
#*******************************************************************************
f_exp=(n/bins1)*np.ones(bins1) #expected frequency, expresses in array form.
#we define an array of ones of the exact size
#as the number of bins, and then just multiply
#it with n/N where n is number of elements
#and N is number of bins.
#So it will look like [n/N,n/N,n/N...]
chisquareval=chisquare(hist,f_exp,axis=0) #Calculate a one-way chi-square test.
#The chi-square test tests the null hypothesis
#that the categorical data has the given frequencies.
#It needs: Observed frequencies in each category, Expected frequencies in each category
print("\n")
print("The result of chi squared test is:", chisquareval, "\n")
#*******************************************************************************
plt.figure() #a unique identifier for the figure
plt.hist(list1[0:n],bins=bins1) #compute and draw the histogram of x with n bins
plt.grid() #configure the grid lines
plt.xlabel('Bins',fontweight='bold') #set the label for the x-axis
plt.ylabel('Frequency',fontweight='bold') #set the label for the y-axis
plt.title('Uniform distribution: number of elements $10^{6}$') #set a title for the hist
plt.show() #display all open figures
#*******************************************************************************
输出如下:
对于第一个直方图,一切正常,有效。在第二个直方图中,我们可以看到统计数据的错误值。我之前跑过那个代码,结果是103,但是我没有改代码!
- 为什么会这样?
- 为什么 chisquare 的显示输出看起来很糟糕?
Power_divergenceResult(statistic=32.315000000000005, pvalue=0.35302378840079285)
statistc和pvalue可以分开打印吗?
【问题讨论】:
-
有什么想法吗?我试图重新编写代码,但没有任何改变。
-
如果我将 bins1 的数量更改为 100,则代码有效!为什么?
标签: python chi-squared