将图例添加到散点图 (PCA)答案

【问题标题】：Add legend to scatter plot (PCA)将图例添加到散点图 (PCA)
【发布时间】：2018-11-12 06:09:30
【问题描述】：

我是 python 新手，发现了这个出色的 PCA 双图建议 (Plot PCA loadings and loading in biplot in sklearn (like R's autoplot))。现在我尝试为不同的目标添加一个图例。但是命令plt.legend() 不起作用。

有简单的方法吗？例如，上面链接中带有 biplot 代码的 iris 数据。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data
y = iris.target
#In general a good idea is to scale the data
scaler = StandardScaler()
scaler.fit(X)
X=scaler.transform(X)    

pca = PCA()
x_new = pca.fit_transform(X)

def myplot(score,coeff,labels=None):
    xs = score[:,0]
    ys = score[:,1]
    n = coeff.shape[0]
    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())
    plt.scatter(xs * scalex,ys * scaley, c = y)
    for i in range(n):
        plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
        if labels is None:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g', ha = 'center', va = 'center')
        else:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha = 'center', va = 'center')
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))
plt.grid()

#Call the function. Use only the 2 PCs.
myplot(x_new[:,0:2],np.transpose(pca.components_[0:2, :]))
plt.show()

欢迎对 PCA 双图提出任何建议！还有其他代码，如果以另一种方式添加图例更容易！

【问题讨论】：

你读过matplotlib.org/users/legend_guide.html吗？
是的，但我不明白如何将其添加到现有代码中:(

标签： python matplotlib legend pca biplot

【解决方案1】：

我最近提出了一种向散点图添加图例的简单方法，请参阅GitHub PR。这仍在讨论中。

与此同时，您需要从 y 中的唯一标签手动创建您的图例。对于它们中的每一个，您将创建一个具有与散点图中使用的相同标记的 Line2D 对象，并将它们作为参数提供给 plt.legend。

scatter = plt.scatter(xs * scalex,ys * scaley, c = y)
labels = np.unique(y)
handles = [plt.Line2D([],[],marker="o", ls="", 
                      color=scatter.cmap(scatter.norm(yi))) for yi in labels]
plt.legend(handles, labels)

【讨论】：

【解决方案2】：

试试“pca”库。这将绘制解释的方差，并创建一个双标图。

pip install pca

from pca import pca

# Initialize to reduce the data up to the number of componentes that explains 95% of the variance.
model = pca(n_components=0.95)

# Or reduce the data towards 2 PCs
model = pca(n_components=2)

# Load example dataset
import pandas as pd
import sklearn
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

# Fit transform
results = model.fit_transform(X)

# Plot explained variance
fig, ax = model.plot()

# Scatter first 2 PCs
fig, ax = model.scatter()

# Make biplot with the number of features
fig, ax = model.biplot(n_feat=4)

结果是一个字典，其中包含 PC、负载等的许多统计信息

【讨论】：