等效于 Python 中的 cor.test 的 R答案

【问题标题】：Equivalent of R's of cor.test in Python等效于 Python 中的 cor.test 的 R
【发布时间】：2015-08-04 02:37:15
【问题描述】：

有没有办法在 Python 中找到 r 置信区间？

在 R 中，我可以这样做：

cor.test(m, h)

    Pearson's product-moment correlation

data:  m and h
t = 0.8974, df = 4, p-value = 0.4202
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6022868  0.9164582
sample estimates:
      cor 
0.4093729

在 Python 中，我可以使用以下方法计算 r (cor)：

r,p = scipy.stats.pearsonr(df.age, df.pets)

但这不会返回 r 置信区间。

【问题讨论】：

标签： numpy statistics scipy statsmodels

【解决方案1】：

这是计算内部置信度的一种方法

先得到相关值（pearson's）

In [85]: from scipy import stats

In [86]: corr = stats.pearsonr(df['col1'], df['col2'])

In [87]: corr
Out[87]: (0.551178607008175, 0.0)

使用 Fisher 变换得到 z

In [88]: z = np.arctanh(corr[0])

In [89]: z
Out[89]: 0.62007264620685021

还有，西格玛值，即标准误差

In [90]: sigma = (1/((len(df.index)-3)**0.5))

In [91]: sigma
Out[91]: 0.013840913308956662

获取正态连续随机变量的正态95%区间概率密度函数应用two-sided条件公式

In [92]: cint = z + np.array([-1, 1]) * sigma * stats.norm.ppf((1+0.95)/2)

最后取双曲正切得到95%的区间值

In [93]: np.tanh(cint)
Out[93]: array([ 0.53201034,  0.56978224])

【讨论】：

谢谢，解决了。我想知道为什么（如果）statsmodels 和/或 scipy 还没有提供这个。
好吧，我也很惊讶，或者我没有很好地搜索堆栈。
这方面有什么更新吗？理想情况下，应该有一个单行 scipy 函数来计算它，而不是您上面提供的八行方法。