如何从两个分布的总和中采样：二项式和泊松答案

【问题标题】：How to sample from a sum of two distributions: binomial and poisson如何从两个分布的总和中采样：二项式和泊松
【发布时间】：2020-03-31 22:47:44
【问题描述】：

有没有办法从两个分布的总和中预测一个值？当我在这里尝试估计 y 时，我在 rstan 上遇到语法错误：y ~ binomial(,) + poisson()


library(rstan)

BH_model_block <- "
data{
  int y; 
  int a; 
}

parameters{
  real <lower = 0, upper = 1> c;
  real <lower = 0, upper = 1> b;
}

model{
  y ~ binomial(a,b)+ poisson(c);
}
"
BH_model <- stan_model(model_code = BH_model_block)
BH_fit <- sampling(BH_model,
                   data = list(y = 5,
                               a = 2), 
                   iter= 1000)

产生这个错误：

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'model2c6022623d56_457bd7ab767c318c1db686d1edf0b8f6' at line 13, column 20
  -------------------------------------------------
    11: 
    12: model{
    13:   y ~ binomial(a,b)+ poisson(c);
                           ^
    14: }
  -------------------------------------------------

PARSER EXPECTED: ";"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '457bd7ab767c318c1db686d1edf0b8f6' due to the above error.

【问题讨论】：

就日志密度而言，y ~ foo(theta) + bar(phi) 的符号应该是什么？您是否正在寻找x + y 的分布x ~ binomial(a, b) 和y ~ poisson(c)？
嗨@BobCarpenter，是的，没错。我正在寻找 x + y 的分布，其中 x ~ binomial(a,b) 和 y ~ poisson(c)。当我在模型块中分离出术语时，我遇到了一个问题，我需要定义新参数“x”，它必须是一个整数。似乎整数不能在参数或转换的参数块中定义。我尝试在模型块中定义“int x”，但初始化失败（可能是因为未定义 x 的范围？）

标签： bayesian rstan

【解决方案1】：

Stan 不支持整数参数，所以从技术上讲你不能这样做。对于两个实变量，它看起来像这样：

parameters {
  real x;
  real y;
}
transformed parameters {
  real z = x + y;
}
model {
  x ~ normal(0, 1);
  y ~ gamma(0.1, 2);
}

然后你得到z 的总和分布。如果变量是离散的，则不会编译。

如果您在模型中不需要z，那么您可以在生成数量块中执行此操作，

generated quantities {
  int x = binomial_rng(a, b);
  int y = poisson_rng(c);
  int z = x + y;
}

这样做的缺点是模型块中没有可用的变量。如果您需要离散参数，则需要按照用户指南关于潜在离散参数的章节（也在关于混合和 HMM 的章节）中的描述将它们边缘化。这对泊松来说并不容易，因为支持是无限的。如果两个离散分布的期望值很小，那么您可以通过对合理值进行循环来近似地做到这一点。

从原帖中的例子看，z 是数据。这与x 和y 的边缘化略有不同，但您只需对x 和y 求和，这样x + y = z 就会大大减少。

【讨论】：

【解决方案2】：

另一种方法是将二项式替换为泊松，并使用泊松可加性：

BH_model_block <- "
data{
  int y; 
  int a; 
}

parameters{
  real <lower = 0, upper = 1> c;
  real <lower = 0, upper = 1> b;
}

model{
  y ~ poisson(a * b + c);
}
"

这不同之处在于，如果b 不小，则二项式的方差比泊松的方差小，但无论如何可能存在过度离散？

【讨论】：