【问题标题】:Python sklearn- gaussian.mixture how to get the samples/points in each clustersPython sklearn- gaussian.mixture 如何获取每个集群中的样本/点
【发布时间】:2020-07-19 22:03:19
【问题描述】:

我正在使用 GMM 将我的数据集聚类到 K 个组,我的模型运行良好,但是无法从每个聚类中获取原始数据,你们能给我一些解决这个问题的想法吗?非常感谢。

【问题讨论】:

    标签: cluster-analysis data-mining gaussian gmm


    【解决方案1】:

    您可以这样做(查看 d0、d1 和 d2)。

    import numpy as np 
    import pandas as pd 
    import matplotlib.pyplot as plt 
    from pandas import DataFrame 
    from sklearn import datasets 
    from sklearn.mixture import GaussianMixture 
    
    # load the iris dataset 
    iris = datasets.load_iris() 
    
    # select first two columns  
    X = iris.data[:, 0:2] 
    
    # turn it into a dataframe 
    d = pd.DataFrame(X) 
    
    # plot the data 
    plt.scatter(d[0], d[1]) 
    
    gmm = GaussianMixture(n_components = 3) 
    
    # Fit the GMM model for the dataset  
    # which expresses the dataset as a  
    # mixture of 3 Gaussian Distribution 
    gmm.fit(d) 
    
    # Assign a label to each sample 
    labels = gmm.predict(d) 
    d['labels']= labels 
    d0 = d[d['labels']== 0] 
    d1 = d[d['labels']== 1] 
    d2 = d[d['labels']== 2] 
    
    # here is a possible solution for you:
    d0
    d1
    d2
    
    # plot three clusters in same plot 
    plt.scatter(d0[0], d0[1], c ='r') 
    plt.scatter(d1[0], d1[1], c ='yellow') 
    plt.scatter(d2[0], d2[1], c ='g') 
    

    # print the converged log-likelihood value 
    print(gmm.lower_bound_) 
    
    # print the number of iterations needed 
    # for the log-likelihood value to converge 
    print(gmm.n_iter_)
    
    # it needed 8 iterations for the log-likelihood to converge.
    

    【讨论】:

    • 如果我有 K 个集群,并且我想将数据从 0 到 K 个集群获取到另一个数据帧,我该怎么办?我如何使用从 0 到 K 个集群的 For 函数,并设置每个集群的名称 d0 = d[d['labels']== 0] d1 = d[d['labels']== 1] d2 = d [d['labels']== 2]
    猜你喜欢
    • 2016-07-11
    • 2018-02-19
    • 2019-11-12
    • 2020-11-09
    • 2017-10-17
    • 2021-08-22
    • 2019-01-12
    • 2022-06-28
    相关资源
    最近更新 更多