【发布时间】:2025-10-07 12:35:02
【问题描述】:
我正在尝试重新创建 John Kruschke 的“Doing Bayesian Data Analysis”中的模型,并且目前正在尝试对序数数据进行建模(本书第 23 章。这是我正在尝试重新创建的 JAGS 模型:
total = length(y)
#Threshold 1 and nYlevels-1 are fixed; other thresholds are estimated.
#This allows all parameters to be interpretable on the response scale.
nYlevels = max(y)
thresh = rep(NA,nYlevels-1)
thresh[1] = 1 + 0.5
thresh[nYlevels-1] = nYlevels-1 + 0.5
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dcat( pr[i,1:nYlevels] )
pr[i,1] <- pnorm( thresh[1] , mu[x[i]] , 1/sigma[x[i]]^2 )
for ( k in 2:(nYlevels-1) ) {
pr[i,k] <- max( 0 , pnorm( thresh[ k ] , mu[x[i]] , 1/sigma[x[i]]^2 )
- pnorm( thresh[k-1] , mu[x[i]] , 1/sigma[x[i]]^2 ) )
}
pr[i,nYlevels] <- 1 - pnorm( thresh[nYlevels-1] , mu[x[i]] , 1/sigma[x[i]]^2 )
}
for ( j in 1:2 ) { # 2 groups
mu[j] ~ dnorm( (1+nYlevels)/2 , 1/(nYlevels)^2 )
sigma[j] ~ dunif( nYlevels/1000 , nYlevels*10 )
}
for ( k in 2:(nYlevels-2) ) { # 1 and nYlevels-1 are fixed, not stochastic
thresh[k] ~ dnorm( k+0.5 , 1/2^2 )
}
}
此模型涉及使用累积法线函数(R 中的pnorm)。我尝试使用 Scipy 的 norm.cdf 进行如下操作:
nYlevels = df.Y.cat.categories.size
thresh = np.arange(1.5, nYlevels)
with pm.Model() as ordinal_model:
#priors for mu and sigma
mu = pm.Normal('mu', (1+ nYlevels)/2.0, 1.0/(nYlevels**2))
sig = pm.Uniform('sig', nYlevels/1000.0, nYlevels*10.0)
#priors for the other parameters (one for each ordinal level, except the fixed lowest and highest levels)
theta2 = pm.Normal('theta2',mu=thresh[1], sigma = (1/2)**2)
theta3 = pm.Normal('theta3',mu=thresh[2], sigma = (1/2)**2)
theta4 = pm.Normal('theta4',mu=thresh[3], sigma = (1/2)**2)
theta5 = pm.Normal('theta5',mu=thresh[4], sigma = (1/2)**2)
#probabilities for the ordinal levels
prob_theta1 = norm(loc=mu, scale = (1 / (sig ** 2))).cdf(thresh[0])
prob_theta2 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta2)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(thresh[0]))])
prob_theta3 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta3)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta2))])
prob_theta4 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta4)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta3))])
prob_theta5 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta5)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta4))])
prob_theta6 = 1 - (norm(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta5))
pr = np.array([prob_theta1,prob_theta2,prob_theta3,prob_theta4,prob_theta5,prob_theta6])
y = pm.Categorical('y', p = pr, observed=df.Y.cat.codes.values)
但是,当我运行它时,我收到一个错误:'TypeError: Unsupported dtype for TensorType: object'。它发生在我使用 norm.cdf 的行。也许 stats.scipy.norm 不能与 pymc3 对象一起使用?
所以我想知道如何在 PyMC3 模型中使用累积法线函数,或者如何让 norm.cdf 在这里工作。
【问题讨论】:
-
谢谢,我已经编辑了问题。
-
感谢更新。是的,PyMC3 RandomVariable 对象不能立即与 NumPy 兼容,因为它们不容易转换为数字——它们是 Theano 计算图中的节点。相反,尝试仅使用 Theano Tensor operations 和 PyMC3 方法重写。值得注意的是,所有 PyMC3 发行版都实现了
logcdf。 The tutorial on distributions 也可能有用。抱歉,我没有时间明确地找出答案。 -
谢谢,这有帮助,我会努力解决的。
-
我在尝试实现特征曲线是正常 CDF 的项目响应模型时遇到了同样的问题。我能够通过使用 Theano 张量运算符
erf操作来完成它,就像 @merv 建议的那样:import theano.tensor as tt; with pm.Model(): b = Normal("b", mu=0, sigma=10); phat = .5 * (1 + tt.erf((b - 0) / np.sqrt(2))); Y = Bernoulli("Y", p=phat, observed=1)希望这会有所帮助!