【发布时间】:2017-09-06 02:22:18
【问题描述】:
我正在研究一个嵌套逻辑回归模型,其中 3 个结果代表选项 A、B 或 C。第一级代表 A 和 B 或 C 之间的选择,第二级代表 B 和 C 之间的选择。下面是一些虚构数据的代码,但我不确定我是否正确指定了模型。根据定义,B 或 C 的概率大于 B 的概率,但是当从后验中对非常小的样本量进行抽样时,“BorC”可能小于 B。这样小的样本量可能不会出现在实际数据中我很感兴趣,但发生这种情况的事实让我觉得我做错了什么。谢谢!
import numpy as np
import pandas as pd
import pymc3 as pm
from scipy.special import logit
import matplotlib.pyplot as plt
from theano import shared
import cPickle as pickle # python 2
def hierarchical_normal(name, shape, mu=0.,cs=5.):
delta = pm.Normal('delta_{}'.format(name), 0., 1., shape=shape)
sigma = pm.HalfCauchy('sigma_{}'.format(name), cs)
return pm.Deterministic(name, mu + delta * sigma)
NUTS_KWARGS = {'target_accept': 0.99}
SEED = 4260026 # from random.org, for reproducibility
np.random.seed(SEED)
ndraws = 1000
counts =[[19, 50, 37],
[21, 67, 55],
[11, 53, 38],
[17, 54, 45],
[24, 93, 66],
[27, 53, 70]]
counts = pd.DataFrame(counts,columns=["A","B","C"])
counts["BorC"] = counts[["B","C"]].sum(axis=1)
counts["n"] = counts[["A","B","C"]].sum(axis=1)
print counts
group = counts.index.values
n_group = np.unique(group).size
obs_B = counts.B.values
obs_BorC = counts.BorC.values
obs_n = counts.n.values
obs_n_ = shared(obs_n)
with pm.Model() as model:
beta0 = pm.Normal('beta0', 0.,sd=5.)
beta_group = hierarchical_normal('beta_group', n_group)
eta_BorC = beta0 + beta_group[group]
p_BorC = pm.math.sigmoid(eta_BorC)
like_BorC = pm.Binomial('obs_BorC', obs_n_, p_BorC, observed=obs_BorC)
alpha0 = pm.Normal('alpha0', 0.,sd=5.)
alpha_group = hierarchical_normal('alpha_group', n_group)
eta_BgivenBorC = alpha0 + alpha_group[group]
p_BgivenBorC = pm.math.sigmoid(eta_BgivenBorC)
like_BgivenBorC = pm.Binomial('obs_BgivenBorC', obs_BorC, p_BgivenBorC, observed=obs_B)
p_B = p_BgivenBorC*p_BorC
like_B = pm.Binomial('obs_B', obs_n_, p_B, observed=obs_B)
trace = pm.sample(draws=ndraws, random_seed=SEED,nuts_kwargs=NUTS_KWARGS)
obs_n_.set_value(np.array([10]*6))
pp_trace = pm.sample_ppc(trace, model=model)
C = pp_trace['obs_BorC']-pp_trace['obs_B']
print np.min(np.min(C))
B = pp_trace['obs_B']
C = np.sum(C,axis=1)
B = np.sum(B,axis=1)
diff = C-B
plt.figure()
plt.hist(diff,50)
plt.show()
编辑:我从浏览 pymc3 代码中看到,日志可能性会自动对所有变量求和,这意味着我对 like_B 的规范是多余的。在这种情况下,我想我只需要弄清楚如何正确设置 obs_BorC 以进行后验采样。
【问题讨论】: