为什么我的大都会算法（mcmc）的python实现这么慢？答案

【问题标题】：why is my python implementation of metropolis algorithm (mcmc) so slow?为什么我的大都会算法（mcmc）的python实现这么慢？
【发布时间】：2019-02-24 14:44:30
【问题描述】：

我正在尝试在 Python 中实现 Metropolis 算法（Metropolis-Hastings 算法的更简单版本）。

这是我的实现：

def Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
    """
    Metropolis Algorithm using a Gaussian proposal distribution.
    p: distribution that we want to sample from (can be unnormalized)
    z0: Initial sample
    sigma: standard deviation of the proposal normal distribution.
    n_samples: number of final samples that we want to obtain.
    burn_in: number of initial samples to discard.
    m: this number is used to take every mth sample at the end
    """
    # List of samples, check feasibility of first sample and set z to first sample
    sample_list = [z0]
    _ = p(z0) 
    z = z0
    # set a counter of samples for burn-in
    n_sampled = 0

    while len(sample_list[::m]) < n_samples:
        # Sample a candidate from Normal(mu, sigma),  draw a uniform sample, find acceptance probability
        cand = np.random.normal(loc=z, scale=sigma)
        u = np.random.rand()
        try:
            prob = min(1, p(cand) / p(z))
        except (OverflowError, ValueError) as error:
            continue
        n_sampled += 1

        if prob > u:
            z = cand  # accept and make candidate the new sample

        # do not add burn-in samples
        if n_sampled > burn_in:
            sample_list.append(z)

    # Finally want to take every Mth sample in order to achieve independence
    return np.array(sample_list)[::m]

当我尝试将我的算法应用于指数函数时，只需要很少的时间。但是，当我在 t-distribution 上尝试它时，考虑到它没有进行那么多计算，这需要很长时间。这就是你可以复制我的代码的方式：

t_samples = Metropolis_Gaussian(pdf_t, 3, 1, 1000, 1000, m=100)
plt.hist(t_samples, density=True, bins=15, label='histogram of samples')
x = np.linspace(min(t_samples), max(t_samples), 100)
plt.plot(x, pdf_t(x), label='t pdf')
plt.xlim(min(t_samples), max(t_samples))
plt.title("Sampling t distribution via Metropolis")
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.legend()

这段代码需要很长时间才能运行，我不知道为什么。在我的 Metropolis_Gaussian 代码中，我试图通过

不将重复样本添加到列表中
不记录老化样本

函数pdf_t定义如下

from scipy.stats import t
def pdf_t(x, df=10):
    return t.pdf(x, df=df)

【问题讨论】：

已在此网站上询问过very similar question。
虽然从标题上看可能不是同一个问题，但我给你的答案和这里一样：Bayesian fit of cosine wave taking longer than expected。我要在这里再次强调，不包括失败接受的重复是渐近不正确的，并导致较低可能性样本值的过度表示。
Bayesian fit of cosine wave taking longer than expected的可能重复

标签： python performance machine-learning random mcmc

【解决方案1】：

我回复了similar question previously。我在那里提到的许多事情（不是在每次迭代中计算当前可能性，预先计算随机创新等）都可以在这里使用。

您的实施的其他改进是不使用列表来存储您的样本。相反，您应该为样本预分配内存并将它们存储为数组。像samples = np.zeros(n_samples) 这样的东西比每次迭代都附加到一个列表更有效。

您已经提到您试图通过不记录老化样本来提高效率。这是一个好主意。您也可以通过仅记录每个第 m 个样本来执行类似的细化技巧，因为无论如何您在返回语句中使用 np.array(sample_list)[::m] 丢弃这些样本。你可以通过改变来做到这一点：

   # do not add burn-in samples
    if n_sampled > burn_in:
        sample_list.append(z)

到

    # Only keep iterations after burn-in and for every m-th iteration
    if n_sampled > burn_in and n_sampled % m == 0:
        samples[(n_sampled - burn_in) // m] = z

还值得注意的是，您不需要计算 min(1, p(cand) / p(z))，只需计算 p(cand) / p(z) 即可。我意识到正式的 min 是必要的（以确保概率在 0 和 1 之间）。但是，在计算上，我们不需要最小值，因为如果p(cand) / p(z) > 1 则p(cand) / p(z)总是大于u。

将所有这些放在一起以及预先计算随机创新、接受概率u 并仅在您真正需要时才计算可能性我想出了：

def my_Metropolis_Gaussian(p, z0, sigma, n_samples=100, burn_in=0, m=1):
    # Pre-allocate memory for samples (much more efficient than using append)
    samples = np.zeros(n_samples)

    # Store initial value
    samples[0] = z0
    z = z0
    # Compute the current likelihood
    l_cur = p(z)

    # Counter
    iter = 0
    # Total number of iterations to make to achieve desired number of samples
    iters = (n_samples * m) + burn_in

    # Sample outside the for loop
    innov = np.random.normal(loc=0, scale=sigma, size=iters)
    u = np.random.rand(iters)

    while iter < iters:
        # Random walk innovation on z
        cand = z + innov[iter]

        # Compute candidate likelihood
        l_cand = p(cand)

        # Accept or reject candidate
        if l_cand / l_cur > u[iter]:
            z = cand
            l_cur = l_cand

        # Only keep iterations after burn-in and for every m-th iteration
        if iter > burn_in and iter % m == 0:
            samples[(iter - burn_in) // m] = z

        iter += 1

    return samples

如果我们看一下性能，我们会发现这个实现比原来的实现快 2 倍，这对于一些小的改动来说还不错。

In [1]: %timeit Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
205 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit my_Metropolis_Gaussian(pdf_t, 3, 1, n_samples=100, burn_in=100, m=10)
102 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

【讨论】：