从任意多元函数中高效采样答案

【问题标题】：Efficiently sample from arbitrary multivariate function从任意多元函数中高效采样
【发布时间】：2018-03-10 16:01:46
【问题描述】：

我想从 Python 中的任意函数中采样。

Fast arbitrary distribution random sampling 中提到可以使用逆变换采样，Pythonic way to select list elements with different probability 中提到应该使用逆累积分布函数。据我所知，这些方法仅适用于单变量情况。我的函数是多变量的，而且过于复杂，https://stackoverflow.com/a/48676209/4533188 中的任何建议都适用。

原理：我的函数是基于Rosenbrock的香蕉函数，我们可以用哪个值得到函数的值

import scipy.optimize
scipy.optimize.rosen([1.1,1.2])

（这里[1.1,1.2]是输入向量）来自scipy，见https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.rosen.html。

这是我想出的：我在我感兴趣的区域上制作了一个网格，并为每个点计算函数值。然后我按值对结果数据框进行排序并进行累积和。通过这种方式，我们得到了具有不同大小的“槽”——具有大函数值的点比具有小函数值的点具有更大的槽。现在我们生成随机值并查看随机值落入哪个槽。数据框的行是我们的最终样本。

代码如下：

import scipy.optimize
from itertools import product
from dfply import *

nb_of_samples = 50
nb_of_grid_points = 30

rosen_data = pd.DataFrame(array([item for item in product(*[linspace(fm[0], fm[1], nb_of_grid_points) for fm in zip([-2,-2], [2,2])])]), columns=['x','y'])
rosen_data['z'] = [np.exp(-scipy.optimize.rosen(row)**2/500) for index, row in rosen_data.iterrows()]
rosen_data = rosen_data >> \
    arrange(X.z) >> \
    mutate(z_upperbound=cumsum(X.z)) >> \
    mutate(z_upperbound=X.z_upperbound/np.max(X.z_upperbound))
value = np.random.sample(1)[0]

def get_rosen_sample(value):
    return (rosen_data >> mask(X.z_upperbound >= value) >> select(X.x, X.y)).iloc[0,]

values = pd.DataFrame([get_rosen_sample(s) for s in np.random.sample(nb_of_samples)])

这很好用，但我认为它不是很有效。什么是对我的问题更有效的解决方案？

我读到马尔可夫链 Monte Carlo 可能会有所帮助，但现在我想知道如何在 Python 中做到这一点。

【问题讨论】：

标签： python distribution sampling

【解决方案1】：

我遇到了类似的情况，所以我实现了 Metropolis-Hastings 的基本版本（这是一种 MCMC 方法）来从双变量分布中采样。下面是一个例子。

说，我们要从以下密度中采样：

def density1(z):
    z = np.reshape(z, [z.shape[0], 2])
    z1, z2 = z[:, 0], z[:, 1]
    norm = np.sqrt(z1 ** 2 + z2 ** 2)
    exp1 = np.exp(-0.5 * ((z1 - 2) / 0.8) ** 2)
    exp2 = np.exp(-0.5 * ((z1 + 2) / 0.8) ** 2)
    u = 0.5 * ((norm - 4) / 0.4) ** 2 - np.log(exp1 + exp2)
    return np.exp(-u)

看起来像这样

下面的函数实现了以多元法线为提案的MH

def metropolis_hastings(target_density, size=500000):
    burnin_size = 10000
    size += burnin_size
    x0 = np.array([[0, 0]])
    xt = x0
    samples = []
    for i in range(size):
        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
        accept_prob = (target_density(xt_candidate))/(target_density(xt))
        if np.random.uniform(0, 1) < accept_prob:
            xt = xt_candidate
        samples.append(xt)
    samples = np.array(samples[burnin_size:])
    samples = np.reshape(samples, [samples.shape[0], 2])
    return samples

运行 MH 并绘制样本

samples = metropolis_hastings(density1)
plt.hexbin(samples[:,0], samples[:,1], cmap='rainbow')
plt.gca().set_aspect('equal', adjustable='box')
plt.xlim([-3, 3])
plt.ylim([-3, 3])
plt.show()

查看我的this repo了解详情。

【讨论】：

这样有效吗？诚然，我只是看了你的帖子，还没有理解。您能详细介绍一下 Metropolis-Hastings 吗？