如何在 scikit-learn 下绘制拟合高斯混合模型的概率密度函数？答案

【问题标题】：How can I plot the probability density function for a fitted Gaussian mixture model under scikit-learn?如何在 scikit-learn 下绘制拟合高斯混合模型的概率密度函数？
【发布时间】：2014-06-29 20:36:03
【问题描述】：

我正在为一项相当简单的任务而苦苦挣扎。我有一个浮点向量，我想用两个高斯核拟合一个高斯混合模型：

from sklearn.mixture import GMM

gmm = GMM(n_components=2)
gmm.fit(values)  # values is numpy vector of floats

我现在想为我创建的混合模型绘制概率密度函数，但我似乎找不到任何有关如何执行此操作的文档。我应该如何最好地进行？

编辑：

Here 是我正在拟合的数据向量。下面是我如何做事的更详细的示例：

from sklearn.mixture import GMM
from matplotlib.pyplot import *
import numpy as np

try:
    import cPickle as pickle
except:
    import pickle

with open('/path/to/kde.pickle') as f:  # open the data file provided above
    kde = pickle.load(f)

gmm = GMM(n_components=2)
gmm.fit(kde)

x = np.linspace(np.min(kde), np.max(kde), len(kde))

# Plot the data to which the GMM is being fitted
figure()
plot(x, kde, color='blue')

# My half-baked attempt at replicating the scipy example
fit = gmm.score_samples(x)[0]
plot(x, fit, color='red')

拟合曲线与我预期的完全不同。它甚至看起来都不是高斯的，这有点奇怪，因为它是由高斯过程产生的。我疯了吗？

【问题讨论】：

改用plot(x, np.exp(fit), color='red')。因为gmm.score_samples 给出了log 的概率。
@blz 指向数据向量的链接已过期。

标签： python matplotlib scikit-learn

【解决方案1】：

我遵循了这个线程和其他人中提到的一些示例，并设法更接近解决方案，但最终的概率密度函数并没有集成到一个。我想，我会在另一个帖子中发布这个问题。

import ntumpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

np.random.seed(1)

mus =  np.array([[0.2], [0.8]])
sigmas = np.array([[0.1], [0.1]]) ** 2
gmm = GaussianMixture(2)
gmm.means_ = mus
gmm.covars_ = sigmas
gmm.weights_ = np.array([0.5, 0.5])

#Fit the GMM with random data from the correspondent gaussians
gaus_samples_1 = np.random.normal(mus[0], sigmas[0], 10).reshape(10,1)
gaus_samples_2 = np.random.normal(mus[1], sigmas[1], 10).reshape(10,1)
fit_samples = np.concatenate((gaus_samples_1, gaus_samples_2))
gmm.fit(fit_samples)

fig = plt.figure()
ax = fig.add_subplot(111)
x = np.linspace(0, 1, 1000).reshape(1000,1)
logprob = gmm.score_samples(x)
pdf = np.exp(logprob)
#print np.max(pdf) -> 19.8409464401 !?
ax.plot(x, pdf, '-k')
plt.show()

【讨论】：

这回答了我关于概率密度函数中大于一的值的问题：math.stackexchange.com/questions/105455/…

【解决方案2】：

看看这个链接：

http://www.astroml.org/book_figures/chapter4/fig_GMM_1D.html

他们展示了如何以 3 种不同的方式绘制一维 GMM：

【讨论】：

【解决方案3】：

看看 Github 上的 scikit-learn 示例之一

https://github.com/scikit-learn/scikit-learn/blob/master/examples/mixture/plot_gmm_pdf.py

这个想法是生成meshgrid，从gmm 中获取他们的score，然后绘制它。

例子展示

【讨论】：

我一直在尝试将相同的教程应用于我的一维数据，但可惜没有这样的运气。你介意看看我的编辑吗？也许他们会揭示我搞砸的地方......

【解决方案4】：

我认为，这是一个很好的资源 - https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/

【讨论】：