【问题标题】:Chi-Squared Test卡方检验
【发布时间】:2023-02-01 22:27:16
【问题描述】:

我有两组分类特征,需要应用卡方检验。我无法利用和理解模块中可用的卡方检验。你能帮我用一个函数来获得 p 值并检验原假设吗?

【问题讨论】:

    标签: python chi-squared statistical-test


    【解决方案1】:

    在这里,我提出了一个函数,它根据两组 pandas DataFrame 计算卡方检验。

    from scipy import stats
    def my_chi2(column, target):
        """
       This method computes p-Value of chi^2 test between column and target
        Inpute:
            column: Data Type Series
            target: Data Type Series
        Output:
            chi_square: float
                Calculated by the formulla
            p_value: float
                CDF of the calculated chi^2 test
        """
        # create contingency table
        data_crosstab = pd.crosstab(column,target, margins=True, margins_name="Total")
        # Calcualtion of Chisquare test statistics
        chi_square = 0
        rows = column.unique()
        columns = target.unique()
        for i in columns:
            for j in rows:
                O = data_crosstab[i][j]
                E = data_crosstab[i]['Total'] * data_crosstab['Total'][j] / data_crosstab['Total']['Total']
                chi_square += (O-E)**2/E
        # The p-value approach
        p_value = 1 - stats.norm.cdf(chi_square, (len(rows)-1)*(len(columns)-1))
        return chi_square, p_value
    
    

    【讨论】: