冗余图例：Matplotlib答案

【问题标题】：Redundant Legends: Matplotlib冗余图例：Matplotlib
【发布时间】：2020-08-14 18:10:30
【问题描述】：

我的散点图有多余的图例。这是我的情节的形象。

关于这个问题，我已经在 * 上检查了以下现有问题： too many legend with array column data in matplotlib

尽管如此，它并没有帮助。我想我遇到了一个完全不同的问题。请告诉我如何解决这个问题。

这是我的代码：

import matplotlib.cm as cm
colors = cm.rainbow(np.linspace(0, 1, N_Clus))
cluster_labels_2 = list(range(1, N_Clus+1))
print("cluster_labels: ", cluster_labels_2)
# Create a figure
plt.figure(figsize=(15,8))
s=0
for color, label in zip(colors, np.asarray(cluster_labels_2).flatten()):
    subset = WorkingDF2[WorkingDF2.Cluster == label]    
    for i in subset.index:
        x=np.asarray(subset["Standardized COVID-19 Index"][i]).flatten()
        y=np.asarray(subset["Standardized CSS Index"][i]).flatten() 
        plt.text(x, y, str(subset['Neighbourhood'][i]), rotation=25) 
        s += 1
        plt.scatter(x, y, c=np.array([color]), label='cluster'+str(label),alpha=0.5)
plt.legend(loc='lower right', fontsize=15)
plt.xlabel('Standardized COVID-19 Index', fontsize=18)
plt.ylabel('Standardized CSS Index', fontsize=18)
plt.title("[Hierarchical Clustering: {} Cluster] \n 
 Mapping of Non-Outlier Neighbourhoods \n 
 onto Standardized CSS-COVID19 Indices Space \n
 ".format(N_Clus), fontsize=18)
print('# of Neighbours: ', s)

【问题讨论】：

这是不可复制的。为Minimal, Reproducible Example (e.g. code, data, errors, current output, expected output), as text 包含足够的数据和导入。 NameError: name 'N_Clus' is not defined.
在标记Neighbourhood 的循环之外执行plot 命令。
嗨特伦顿麦金尼，感谢您的来信。我现在将删除它并考虑如何创建可重现的代码，因为原始代码太长了。对不起，我是新来的，还在学习规则。谢谢
查看How to provide a reproducible copy of your DataFrame using df.head(30).to_clipboard()，然后edit您的问题，并将剪贴板粘贴到代码块中。 一个正确的问题必须提供所有必要的信息，以便给出正确的答案。
@Jody Klymak 谢谢。我会努力的！

标签： python matplotlib legend

【解决方案1】：

问题出在一行

plt.scatter(x, y, c=np.array([color]), label='cluster'+str(label),alpha=0.5)

在这里，您给彩色点一个标签'cluster' + str(label)，即使这样的标签已经存在，所以plt.legend() 将创建许多相同的图例元素。如果它不是新的，我会跟踪以前的标签并将当前情节的一个设置为None，以便plt.legend() 忽略它。

请注意，您的命名选择可能会有些混乱，因为 matplotlib 使用“标签”作为出现在图例中的曲线的名称，而您将其用作簇编号。我们可以叫它cluster_number吗？

这里是实现：

import matplotlib.cm as cm

colors = cm.rainbow(np.linspace(0, 1, N_Clus))
cluster_labels_2 = list(range(1, N_Clus+1))
print("cluster_labels: ", cluster_labels_2)

# Create a figure.
plt.figure(figsize=(15, 8))
s=0
clusters_already_in_the_legend = []
for color, cluster_number in zip(colors, np.asarray(cluster_labels_2).flatten()):
    subset = WorkingDF2[WorkingDF2.Cluster == cluster_number]    
    for i in subset.index:
        x = np.asarray(subset["Standardized COVID-19 Index"][i]).flatten()
        y = np.asarray(subset["Standardized CSS Index"][i]).flatten() 
        plt.text(x, y, str(subset['Neighbourhood'][i]), rotation=25) 
        s += 1

        # Keeping track of the labels so that we don't legend them multiple times.
        if cluster_number not in clusters_already_in_the_legend:
            clusters_already_in_the_legend.append(cluster_number)
            label = f"Cluster {cluster_number}"
        else:
            label = None
        plt.scatter(x, y, c=np.array([color]), label=label, alpha=0.5)

plt.legend(loc='lower right', fontsize=15)
plt.xlabel('Standardized COVID-19 Index', fontsize=18)
plt.ylabel('Standardized CSS Index', fontsize=18)
plt.title("[Hierarchical Clustering: {} Cluster] \n 
 Mapping of Non-Outlier Neighbourhoods \n 
 onto Standardized CSS-COVID19 Indices Space \n
 ".format(N_Clus), fontsize=18)
print('# of Neighbours: ', s)

【讨论】：

感谢您的建议。经过一些小的修改，它工作了！太棒了！
两个修改：1）subset = WorkingDF2[WorkingDF2.Cluster == cluster_number]; 2) 标签==> 标签_