【问题标题】:Python: Implement mean of means 95% Confidence Interval?Python:实现均值 95% 置信区间?
【发布时间】:2018-01-22 18:13:57
【问题描述】:

this solution 如何使用 pandas/python 实现?此问题涉及使用此stats.stackexchange solution 围绕均值查找 95% CI 的实现。

import pandas as pd
from IPython.display import display
import scipy
import scipy.stats as st 
import scikits.bootstrap as bootstraps

data = pd.DataFrame({
     "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52]
    ,"exp3":[29, 31, 35]
}).T

data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
display(data)

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>0</th>      <th>1</th>      <th>2</th>      <th>row_mean</th>      <th>row_std</th>    </tr>  </thead>  <tbody>    <tr>      <th>exp1</th>      <td>34</td>      <td>41</td>      <td>39</td>      <td>38.000000</td>      <td>2.943920</td>    </tr>    <tr>      <th>exp2</th>      <td>45</td>      <td>51</td>      <td>52</td>      <td>49.333333</td>      <td>3.091206</td>    </tr>    <tr>      <th>exp3</th>      <td>29</td>      <td>31</td>      <td>35</td>      <td>31.666667</td>      <td>2.494438</td>    </tr>
</tbody> </table>
mean_of_means = data.row_mean.mean()
std_of_means = data.row_mean.std()
confidence = 0.95
print("mean(means): {}\nstd(means):{}".format(mean_of_means,std_of_means))
  • 平均值(平均值):39.66666666666667
  • 标准(意思):8.950481054731702

第一次不正确尝试(zscore):

zscore = st.norm.ppf(1-(1-confidence)/2)
lower_bound = mean_of_means - (zscore*std_of_means)
upper_bound = mean_of_means + (zscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
  • 95% CI = [22.1,57.2](不正确解)

第二次不正确尝试(tcore):

tscore = st.t.ppf(1-0.05, data.shape[0])
lower_bound = mean_of_means - (tscore*std_of_means)
upper_bound = mean_of_means + (tscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))
  • 95% CI = [18.60,60.73](不正确解)

第三次不正确尝试(引导):

CIs = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05)
  • 95% CI = [31.67, 49.33](不正确解)

如何使用pandas/python实现this solution以获得下面的正确解决方案?

  • 95% CI = [17.4 to 61.9](正确解)

【问题讨论】:

  • 也许scikits-bootstrap 做你想做的事?
  • @xaav,刚刚添加了一个使用这个建议的例子,不幸的是它没有提供正确的解决方案,尽管我可能使用不正确。我不确定 alpha 是否应该设置为 0.05 或 0.025,但无论哪种方式,它都不正确。

标签: python pandas statistics average confidence-interval


【解决方案1】:

谢谢乔恩·贝茨。

import pandas as pd
import scipy
import scipy.stats as st 

data = pd.DataFrame({
     "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52]
    ,"exp3":[29, 31, 35]
}).T

data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)

tscore = st.t.ppf(1-0.025, data.shape[0]-1)

print("mean(means): {}\nstd(means): {}\ntscore: {}".format(mean_of_means,std_of_means,tscore))

lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5))
upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5))

print("95% CI = [{},{}]".format(lower_bound,upper_bound))

平均值(平均值):39.66666666666667
标准(意味着):8.950481054731702
评分:4.302652729911275
95% CI = [17.432439139464606,61.90089419386874]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-11-16
    • 1970-01-01
    • 1970-01-01
    • 2017-11-20
    • 1970-01-01
    相关资源
    最近更新 更多