将聚类结果绘制为网络图并将其可视化答案

【问题标题】：Plot and visualize results of clustering as a network graph将聚类结果绘制为网络图并将其可视化
【发布时间】：2019-04-09 07:34:34
【问题描述】：

我正在 Python 中尝试各种聚类算法和字符串距离度量，最终目标是根据各种距离度量（例如 Levenshtein、Jaro 等）对字符串列表进行聚类（每个字符串通常有 1 或 2 个单词） .

我已经构建了代码，用于根据不同的距离度量（使用 jellyfish 包）计算字符串之间的距离，并使用 sklearn.cluster 包提供的不同算法对它们进行聚类。以下是 Jaro 距离和 MeanShift 聚类的一些示例代码：

tokens = np.array(["test1", "test2", "test3", "cat", "cat food", "apple", "apple pie"])

distances = -1 * np.array([[jellyfish.jaro_distance(w1, w2) for w1 in tokens] for w2 in tokens])

meanshift = sklearn.cluster.MeanShift()
meanshift.fit(distances)

clusters = dict()
key = 0
for cluster_id in np.unique(meanshift.labels_):
    cluster = np.unique(tokens[np.nonzero(meanshift.labels_ == cluster_id)])
    clusters[key] = cluster.tolist()
    key += 1

plot_clusters(clusters, ...)

现在我想绘制/可视化/保存聚类的结果，最好作为类似于这个 [1] 的网络图。我会很高兴有一个简单的可视化，可以很容易地看到（和计算）不同的集群。这就是为什么我只用集群元素构建一个字典。但是，如果可视化考虑到预先计算的数据点之间的距离，那就太好了。无论哪种方式对我来说都很好。我只是想要一些漂亮的可视化来配合实际集群的分析。

有人对如何解决这个问题有一些想法或建议吗？任何帮助将不胜感激！

谢谢！

[1]https://www.kdnuggets.com/wp-content/uploads/k-means-datasci.jpg

免责声明：我是 python 和机器学习的新手

【问题讨论】：

标签： python matplotlib graph cluster-analysis networkx

【解决方案1】：

它还没有显示距离，但你可以做一些彩色散点图，比如

import matplotlib.pyplot as plt
from matplotlib.pyplot import cm

plt.figure()
clustercount = len(clusters)
color=iter(cm.rainbow(np.linspace(0,1,clustercount)))

for cl in clusters:    
    c=next(color)
    x = # x data of your cluster here
    y = # y data of your cluster here
    label = # label of your cluster here

    plt.scatter(x, y, color=c, label=label)

plt.xlabel('X');
plt.ylabel('Y');
plt.legend(loc=2);
plt.show()

这将以不同颜色显示集群，以便您轻松查看和计算它们

也许您可以通过 meanshift.cluster_centers_ 访问集群中心。如果是这样，您也可以用静态颜色绘制它们以可视化距离。

【讨论】：