【问题标题】:Generating 2-dimensional sample dataset from a mixture Guassian distribution从混合高斯分布生成二维样本数据集
【发布时间】:2021-09-05 17:07:14
【问题描述】:

我想生成一个二维样本数据集。我复制了link 中所述的代码并将其翻倍以生成向量 X、Y 以将它们分散为二维数据集,如下所示。但结果并不理想。事实上,我想要下图这样的东西。

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

mu = [1,4]
sigma = [2, 1]
p_i = [0.3, 0.7]
n = 1000

x = []
y=[]
for i in range(n):
    z_i = np.argmax(np.random.multinomial(1, p_i)) #np.random.multinomial(1,[0.3,0.5,0.2]) returns the result of an experiment
    #of rolling a dice. the result is as this: [1,0,0]. this means that the side one occurs in the experiment and the others 
    #not. the goal is choosing mu[i] in a random way
    x_i = np.random.normal(mu[z_i], sigma[z_i])
    x.append(x_i)

    
mu = [3,6]
sigma = [1, 2]
p_i = [0.6, 0.4]    

for i in range(n):
    z_i = np.argmax(np.random.multinomial(1, p_i)) #np.random.multinomial(1,[0.3,0.5,0.2]) returns the result of an experiment
    #of rolling a dice. the result is as this: [1,0,0]. this means that the side one occurs in the experiment and the others 
    #not. the goal is choosing mu[i] in a random way
    y_i = np.random.normal(mu[z_i], sigma[z_i])
    y.append(y_i)

plt.scatter(x, y)
plt.show()

`

谁能帮帮我?

【问题讨论】:

    标签: python numpy matplotlib dataset


    【解决方案1】:

    看起来您要绘制的是从 2 个不同的 2D 高斯中采样的数据。这是可以绘制看起来像这样的模拟数据的代码。随意调整均值和协方差矩阵以满足您的需要。

    from numpy.random import multivariate_normal
    
    # First 2D gaussian:
    mu = [1, 3]
    cov = [[0.07, 0],[0, 1.8]]
    x, y = np.random.multivariate_normal(mu, cov, 200).T
    
    plt.figure(figsize=(10,6))
    plt.scatter(x, y, s=5, color='blue')
    ax = plt.gca()
    
    # Second 2D gaussian:
    mu = [2, 1]
    cov = [[0.8, -0.4],[-0.4, 0.5]]
    x, y = np.random.multivariate_normal(mu, cov, 200).T
    plt.scatter(x, y, s=5, color='red')
    
    plt.xlim([-2, 8])
    plt.ylim([-6, 10]);
    

    这会产生类似于下图的内容(不同的颜色,以便您可以看到图案):

    【讨论】:

    • 谢谢 TC。是的,这正是我的意思。
    • 亲爱的 TC,这个 numpy 方法是否通过 Monte-Carlo 模拟进行采样?
    猜你喜欢
    • 2011-01-05
    • 2014-10-08
    • 2013-03-03
    • 2015-04-05
    • 2016-09-16
    • 1970-01-01
    • 2010-11-09
    • 2013-02-01
    • 2020-10-07
    相关资源
    最近更新 更多