【问题标题】:Plot cluster matrix绘制簇矩阵
【发布时间】:2020-03-20 21:25:11
【问题描述】:

我想使用以下 pandas 数据框从 scikit-learn 的 K-means 中绘制一个聚类矩阵:

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer() # toy dataset
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
df = data.iloc[:,4:8] #select subset
df.columns = ['smoothness', 'compactness', 'concavity', 'concave points'] 
df

+----+--------------+---------------+-------------+------------------+
|    |   smoothness |   compactness |   concavity |   concave points |
|----+--------------+---------------+-------------+------------------|
|  0 |      0.1184  |       0.2776  |      0.3001 |          0.1471  |
|  1 |      0.08474 |       0.07864 |      0.0869 |          0.07017 |
|  2 |      0.1096  |       0.1599  |      0.1974 |          0.1279  |
|  3 |      0.1425  |       0.2839  |      0.2414 |          0.1052  |
|  4 |      0.1003  |       0.1328  |      0.198  |          0.1043  |
+----+--------------+---------------+-------------+------------------+

【问题讨论】:

    标签: python pandas matplotlib scikit-learn cluster-analysis


    【解决方案1】:

    IIUC,您可以简化使用seaborn.pairplot 并将Kmeans.label_ 作为hue 参数传入。例如:

    import seaborn as sns
    from sklearn.cluster import KMeans
    
    def kmeans_scatterplot(df, n_clusters):
        km = KMeans(init='k-means++', n_clusters=n_clusters)
        km_clustering = km.fit(df)
        sns.pairplot(df.assign(hue=km_clustering.labels_), hue='hue')
    
    kmeans_scatterplot(df, 2)
    

    [出]

    【讨论】:

      【解决方案2】:

      您可以这样做:

      def kmeans_scatterplot(df, n_clusters):
          axs_length = len(df.columns) 
          fig, axs = plt.subplots(axs_length, axs_length, figsize=(20,20))
      
          for i, column_i in enumerate(df):
              for j, column_j in enumerate(df):
      
                  # create plot
                  if column_i != column_j:
                      df_temp = df[[column_i, column_j]]
                      km = KMeans(init='k-means++', n_clusters=n_clusters)
                      km_clustering = km.fit(df_temp)
                      axs[i][j].scatter(df_temp[column_i], df_temp[column_j], c=km_clustering.labels_, cmap='rainbow', alpha=0.7, edgecolors='b')
      
                  # only show left and bottom lables
                  if i == axs_length - 1:
                      axs[i][j].set_xlabel(column_j)
                  if j == 0:
                      axs[i][j].set_ylabel(column_i)
      
      kmeans_scatterplot(df, 2)
      

      结果:

      【讨论】:

      • 应禁止直接与matplotlib 进行绘图,并将海报烧毁;p
      猜你喜欢
      • 2020-10-29
      • 1970-01-01
      • 2015-01-21
      • 1970-01-01
      • 2022-12-18
      • 2019-09-19
      • 2019-05-09
      • 2016-01-31
      相关资源
      最近更新 更多