【问题标题】:Extracting centroids with its data point using K-Medoids clustering in Python?在 Python 中使用 K-Medoids 聚类提取质心及其数据点?
【发布时间】:2020-11-17 06:30:25
【问题描述】:

我在一维数组X 中有一些数据,其中包含 10 个元素。我在此数据上应用了KMedoids 聚类,其中 3 个聚类。应用KMedoids 后,我得到了每个集群的集群标签(id)和质心。

from sklearn.metrics import silhouette_samples
from sklearn_extra.cluster import KMedoids
import pandas as pd
import numpy as np

X = np.array([0.85142858, 0.85566274, 0.85364912, 0.81536489, 0.84929932, 
              0.85042336, 0.84899714, 0.82019115, 0.86112067, 0.8312496 ])
X = X.reshape(-1, 1)

model1 = KMedoids(n_clusters=3, random_state=0).fit(X)
cluster_labels = model1.predict(X)  
clusters, counts = np.unique(cluster_labels[cluster_labels>=0], 
                             return_counts=True)
centroids = np.array(model1.cluster_centers_)

print("For centroids", centroids) 
print("***************")
for i in range(len(X)):
    print(i, X[i])

这段代码的结果是

For centroids [[0.85566274]
     [0.85042336]
     [0.82019115]]
    ***************
    0 [0.85142858]
    1 [0.85566274]
    2 [0.85364912]
    3 [0.81536489]
    4 [0.84929932]
    5 [0.85042336]
    6 [0.84899714]
    7 [0.82019115]
    8 [0.86112067]
    9 [0.8312496]

但是,我想用它的数据点显示质心。例如,

For centroids [[0.85566274] , 1 [0.85566274]
For centroids [0.85042336]  , 5 [0.85042336]
For centroids [0.82019115]] , 7 [0.82019115]

我怎样才能做到这一点?

【问题讨论】:

    标签: python pandas numpy scikit-learn cluster-analysis


    【解决方案1】:

    您可以打印带有标签、中心点和索引的表格,如下所示:

    import numpy as np
    from sklearn_extra.cluster import KMedoids
    
    X = np.array([[0.85142858],
                  [0.85566274],
                  [0.85364912],
                  [0.81536489],
                  [0.84929932],
                  [0.85042336],
                  [0.84899714],
                  [0.82019115],
                  [0.86112067],
                  [0.8312496 ]])
    
    kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)
    
    print('Label   Medoid        Index')
    print('---------------------------')
    for index in kmedoids.medoid_indices_:
        label = kmedoids.labels_[index]
        medoid = X[index]
        print(f'{label:<7} {medoid}  {index}')
    

    输出

    Label   Medoid        Index
    ---------------------------
    0       [0.85566274]  1
    1       [0.85042336]  5
    2       [0.82019115]  7
    

    或者,您可以根据您的要求将结果存储在 pandas 数据框中:

    import pandas as pd
    
    df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_],
                       'medoid': np.squeeze(X[kmedoids.medoid_indices_]),
                       'index': kmedoids.medoid_indices_})
    print(df)
    

    输出

       label    medoid  index
    0      0  0.855663      1
    1      1  0.850423      5
    2      2  0.820191      7
    

    【讨论】:

    • 谢谢,@Tonechas 我们可以在数据帧或数组中返回结果吗???
    猜你喜欢
    • 2018-04-27
    • 2017-10-10
    • 2017-05-12
    • 2014-05-02
    • 2018-09-19
    • 2017-09-12
    • 2020-02-21
    • 2019-01-25
    • 2021-04-29
    相关资源
    最近更新 更多