Stan 中的有序 Probit 估计答案

【问题标题】：Ordered Probit estimation in StanStan 中的有序 Probit 估计
【发布时间】：2018-11-27 00:54:12
【问题描述】：

我正在尝试在 Stan 中复制 John Kruschke 的“做贝叶斯分析”（第 676 页）中的有序概率 JAGS 模型：

JAGS 型号：

model {
    for ( i in 1:Ntotal ) {
      y[i] ~ dcat( pr[i,1:nYlevels] )
      pr[i,1] <- pnorm( thresh[1] , mu , 1/sigma^2 )
      for ( k in 2:(nYlevels-1) ) {
        pr[i,k] <- max( 0 ,  pnorm( thresh[ k ] , mu , 1/sigma^2 )
                           - pnorm( thresh[k-1] , mu , 1/sigma^2 ) )
      }
      pr[i,nYlevels] <- 1 - pnorm( thresh[nYlevels-1] , mu , 1/sigma^2 )
    }
    mu ~ dnorm( (1+nYlevels)/2 , 1/(nYlevels)^2 )
    sigma ~ dunif( nYlevels/1000 , nYlevels*10 )
    for ( k in 2:(nYlevels-2) ) {  # 1 and nYlevels-1 are fixed, not stochastic
      thresh[k] ~ dnorm( k+0.5 , 1/2^2 )
    }
  }

到目前为止，我有以下运行，但没有产生与书中相同的结果。斯坦模型：

data{
  int<lower=1> n; // number of obs
  int<lower=3> n_levels; // number of categories

  int y[n]; // outcome var 
}

parameters{
  real mu; // latent mean
  real<lower=0> sigma; // latent sd
  ordered[n_levels] thresh; // thresholds

}

model{
  vector[n_levels] pr[n];

  mu ~ normal( (1+n_levels)/2 , 1/(n_levels)^2 );
  sigma ~ uniform( n_levels/1000 , n_levels*10 );


  for ( k in 2:(n_levels-2) ) // 1 and nYlevels-1 are fixed, not stochastic
    thresh[k] ~ normal( k+0.5 , 1/2^2 );

  for(i in 1:n) {

    pr[i, 1] = normal_cdf(thresh[1], mu, 1/sigma^2);

    for (k in 2:(n_levels-1)) {
      pr[i, k] = max([0.0, normal_cdf(thresh[k], mu, 1/sigma^2) - normal_cdf(thresh[k-1], mu, 1/sigma^2)]);
    }

    pr[i, n_levels] = 1 - normal_cdf(thresh[n_levels - 1], mu, 1/sigma^2);

    y[i] ~ categorical(pr[i, 1:n_levels]);
  }

}

数据在这里：

list(n = 100L, n_levels = 7, y = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 7L))

应该恢复 1.0 的 mu 和 2.5 的 sigma。相反，我得到了 3.98 的 mu 和 1.25 的 sigma。

我确定我在 Stan 模型中做错了什么，但我还是个初学者，不知道下一步该做什么。谢谢！

【问题讨论】：

要检查的一个基本事项是您是否正确指定了正态分布 - “与 JAGS 不同，Stan 根据均值和标准差定义正态分布，而不是均值和精度”（来自:ling.uni-potsdam.de/~vasishth/JAGSStanTutorial/…)，所以你想做normal(mean, sd)而不是normal(mean, 1 / sd^2)。
您还必须注意这些模型的可识别性。你不能有一个截距和完全不同的切点。您是否运行了多个链并获得了接近 1 的 Rhat 和不错的有效样本量？
谢谢@Marius！我发布了一个新模型作为使用您的建议的答案。
谢谢@BobCarpenter！我发布了一个新模型作为使用您的建议的答案。

标签： r bayesian stan rstan

【解决方案1】：

更新：在网上搜索后（特别感谢Conor Goold），我想出了这个解决方案，可以非常接近地复制书中的结果。当然，任何反馈/更好的模型分解仍然会获得公认的答案！

data {
  real L;                     // Lower fixed thresholds
  real<lower=L> U;            // Upper fixed threshold

  int<lower=2> J;             // Number of outcome levels

  int<lower=0> N;             // Data length

  int<lower=1,upper=J> y[N];  // Ordinal responses
}

transformed data {
  real<lower=0> diff;         // difference between upper and lower fixed thresholds
  int<lower=1> K;             // Number of thresholds

  K = J - 1;
  diff = U - L;
}

parameters {
  simplex[K - 1] thresh_raw;      // raw thresholds
  real mu; // latent mean
  real<lower=0> sigma; // latent sd
}

transformed parameters {
  ordered[K] thresh;     // new thresholds with fixed first and last

  thresh[1] = L;
  thresh[2:K] = L + diff * cumulative_sum(thresh_raw);
  thresh[K] = U; // Overwrite last value to fix it
}

model{
  vector[J] theta;                  // local parameter for ordinal categories

  //priors
  mu ~ normal( (1+J)/2.0 , J );
  sigma ~ uniform( J/1000.0 , J*10 );

  for (i in 2:K-2)
    thresh[i] ~ normal(i + .05, 2);

  // likelihood statement
  for(n in 1:N) {

    // probability of ordinal category definitions
    theta[1] = normal_cdf( thresh[1] , mu, sigma );

    for (l in 2:K)
      theta[l] = fmax(0, normal_cdf(thresh[l], mu, sigma ) - normal_cdf(thresh[l-1], mu, sigma));

    theta[J] = 1 - normal_cdf(thresh[K] , mu, sigma);

    y[n] ~ categorical(theta);
  }
}

【讨论】：

sigma 上的那些硬区间先验在统计上都是一个坏主意（如果有任何质量靠近边界，您就会遇到计算和估计偏差问题），并且它们违反了 Stan 的基本原则，即任何值满足声明的参数约束的参数应该有支持（这里，大于 J*10 的值满足非负约束，但没有支持）。
第二个问题是应用fmax 会破坏可微性，这就是为什么在手册中将它称为根据参数应用到任何东西是一件坏事的原因。 Stan 通过将梯度信息从最终的对数密度值传递回参数和fmax 和其他离散操作有效地将这些导数切回到mu、`sigma 等。相反，应该设置问题以便正常-cdf 差异不能低于零。使用一组有序的切点比缩放单纯形的累积和可能更容易。