【问题标题】:How do I obtain individual centroids of K mean cluster using nltk (python)如何使用 nltk (python) 获得 K 均值簇的单个质心
【发布时间】:2020-04-19 15:56:23
【问题描述】:

我使用 nltk 来执行 k 均值聚类,因为我想将距离度量更改为余弦距离。但是,如何获得所有簇的质心?

kclusterer = KMeansClusterer(8, distance = nltk.cluster.util.cosine_distance, repeats = 1)
predict = kclusterer.cluster(features, assign_clusters = True)
centroids = kclusterer._centroid
df_clustering['cluster'] = predict
#df_clustering['centroid'] = centroids[df_clustering['cluster'] - 1].tolist()
df_clustering['centroid'] = centroids

我正在尝试在 pandas 数据帧上执行 k 均值聚类,并希望每个数据点的聚类质心坐标位于数据帧列“质心”中。

提前谢谢你!

【问题讨论】:

    标签: python nltk k-means


    【解决方案1】:
    import pandas as pd
    import numpy as np
    
    # created dummy dataframe with 3 feature
    df = pd.DataFrame([[1,2,3],[50, 51,52],[2.0,6.0,8.5],[50.11,53.78,52]], columns = ['feature1', 'feature2','feature3'])
    print(df)
    

    obj = KMeansClusterer(2, distance = nltk.cluster.util.cosine_distance) #giving number of cluster 2
    vectors = [np.array(f) for f in df.values]
    
    df['predicted_cluster'] = obj.cluster(vectors,assign_clusters = True))
    

    print(obj.means())
    #OP
    [array([50.055, 52.39 , 52.   ]), array([1.5 , 4.  , 5.75])] #which is going to be mean of three feature for 2 cluster, since number of cluster that we passed is 2
    
     #now if u want the cluster center in pandas dataframe 
     df['centroid'] = df['predicted_cluster'].apply(lambda x: obj.means()[x])
    

    【讨论】:

      猜你喜欢
      • 2013-09-29
      • 2021-03-03
      • 2012-06-19
      • 2014-07-21
      • 2021-04-04
      • 2017-11-04
      • 2017-09-24
      • 2023-03-30
      • 2016-05-05
      相关资源
      最近更新 更多