【发布时间】:2019-02-24 14:44:30
【问题描述】:
我正在尝试在 Python 中实现 Metropolis 算法(Metropolis-Hastings 算法的更简单版本)。
这是我的实现:
def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
"""
Metropolis Algorithm using a Gaussian proposal distribution.
p: distribution that we want to sample from (can be unnormalized)
z0: Initial sample
sigma: standard deviation of the proposal normal distribution.
n_samples: number of final samples that we want to obtain.
burn_in: number of initial samples to discard.
m: this number is used to take every mth sample at the end
"""
# List of samples, check feasibility of first sample and set z to first sample
sample_list = [z0]
_ = p(z0)
z = z0
# set a counter of samples for burn-in
n_sampled = 0
while len(sample_list[::m]) < n_samples:
# Sample a candidate from Normal(mu, sigma), draw a uniform sample, find acceptance probability
cand = np.random.normal(loc=z, scale=sigma)
u = np.random.rand()
try:
prob = min(1, p(cand) / p(z))
except (OverflowError, ValueError) as error:
continue
n_sampled += 1
if prob > u:
z = cand # accept and make candidate the new sample
# do not add burn-in samples
if n_sampled > burn_in:
sample_list.append(z)
# Finally want to take every Mth sample in order to achieve independence
return np.array(sample_list)[::m]
当我尝试将我的算法应用于指数函数时,只需要很少的时间。但是,当我在 t-distribution 上尝试它时,考虑到它没有进行那么多计算,这需要很长时间。这就是你可以复制我的代码的方式:
t_samples = Metropolis_Gaussian(pdf_t, 3, 1, 1000, 1000, m=100)
plt.hist(t_samples, density=True, bins=15, label='histogram of samples')
x = np.linspace(min(t_samples), max(t_samples), 100)
plt.plot(x, pdf_t(x), label='t pdf')
plt.xlim(min(t_samples), max(t_samples))
plt.title("Sampling t distribution via Metropolis")
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend()
这段代码需要很长时间才能运行,我不知道为什么。在我的 Metropolis_Gaussian 代码中,我试图通过
- 不将重复样本添加到列表中
- 不记录老化样本
函数pdf_t定义如下
from scipy.stats import t
def pdf_t(x, df=10):
return t.pdf(x, df=df)
【问题讨论】:
-
已在此网站上询问过very similar question。
-
虽然从标题上看可能不是同一个问题,但我给你的答案和这里一样:Bayesian fit of cosine wave taking longer than expected。我要在这里再次强调,不包括失败接受的重复是渐近不正确的,并导致较低可能性样本值的过度表示。
标签: python performance machine-learning random mcmc