【发布时间】:2017-08-10 23:27:31
【问题描述】:
我在 Scikit 中使用 K-means 进行了聚类。然后,我根据Scikit example 绘制了集群区域。接下来,对于每个集群,我再次进行集群,我想在同一个图上显示子集群的边界。我发现这个question 很有趣,但是当我应用这种方法时,轴范围发生了变化,并且出现了一个新图。
已编辑:我的功能如下:
def plot_pca_clusters_races_match(pca_km, reduced_data, pca_data_winner,
race1_pca_km, race1_reduced_data, race1_pca_data_winner, race1_nclusters,
race2_pca_km, race2_reduced_data, race2_pca_data_winner, race2_nclusters,
plt_opt, fig_path, race_approach, n_clusters):
"""
:param pca_km: K-means trained by PCA data (2 components)
:param reduced_data: PCA components
:param data_winner: player_id, pca_component1, pca_component2, race_id, winner
:param plt_opt: space required to plot cluster area
:param fig_path: path to save the plot
:param race_approach:
:param n_clusters:
:return:
"""
race_id_list = ['Z', 'T', 'P']
# 1- Plot cluster area
x_min, x_max = reduced_data[:, 0].min() + plt_opt[0], reduced_data[:, 0].max() + plt_opt[1]
y_min, y_max = reduced_data[:, 1].min() + plt_opt[2], reduced_data[:, 1].max() + plt_opt[3]
step = abs((abs(x_max) - abs(x_min))) / 100
xx, yy = np.meshgrid(np.arange(x_min, x_max, step), np.arange(y_min, y_max, step))
Z = pca_km.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(1)
plt.clf()
# Plot cluster regions
plt.imshow(Z, interpolation='nearest',
extent=(xx.min(), xx.max(), yy.min(), yy.max()),
cmap=plt.cm.Paired,
aspect='auto', origin='lower')
# 2- Plot cluster members
race_ids = list(set(pca_data_winner[:, -3]))
# Find race type
reduced_data_race1 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[0]), :][0]
# Plot race 1
plt.plot(reduced_data_race1[:, 2], reduced_data_race1[:, 3], 'k.', markersize=4, color='red',
label=race_id_list[int(race_ids[0])])
# Plot race 2
# If the race is non-symmetric, change color of the cluster members
if len(race_ids) > 1:
reduced_data_race2 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[1]), :][0]
plt.plot(reduced_data_race2[:, 2], reduced_data_race2[:, 3], 'k.', markersize=4, color='green',
label=race_id_list[int(race_ids[1])], hold=True)
# 3-Plot cluster centers
markers = ['d', 'v', 's', '*', 'h', 'p', 'o']
for cluster in range(0, pca_km.cluster_centers_.shape[0]):
plt.scatter(pca_km.cluster_centers_[cluster, 0], pca_km.cluster_centers_[cluster, 1],
marker=markers[cluster], s=80, linewidths=1,
label='Cluster ' + str(cluster),
color='b', zorder=4, hold=True)
plt.xlabel('PC 1')
plt.ylabel('PC 2')
plt.legend(prop={'size':8})
# --------------------------------------------- Plot boundaries of sub-clusters
x1_min, x1_max = race1_reduced_data[:, 0].min() + plt_opt[0], race1_reduced_data[:, 0].max() + plt_opt[1]
y1_min, y1_max = race1_reduced_data[:, 1].min() + plt_opt[2], race1_reduced_data[:, 1].max() + plt_opt[3]
step = abs((abs(x_max) - abs(x_min))) / 100
xx1, yy1 = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(y1_min, y1_max, step))
Z1 = race1_pca_km.predict(np.c_[xx1.ravel(), yy1.ravel()])
Z1 = Z1.reshape(xx1.shape)
# Plot sub-cluster boundaries
plt.contour(Z, extent=(xx.min(), xx.max(), yy.min(), yy.max()))
【问题讨论】:
-
我想你忘了描述你的问题并在这里提问。 ;-)
-
@ImportanceOfBeingErnest :D 我添加了更多细节。
标签: python matplotlib scikit-learn cluster-analysis contour