在 matplotlib/python 中可视化高维数据答案

【问题标题】：visualizing high dimension data in matplotlib/python在 matplotlib/python 中可视化高维数据
【发布时间】：2018-07-16 06:50:48
【问题描述】：

我正在尝试使用高斯过程将平滑函数拟合到某些数据点。我正在为 python 使用scikit-learn 库，在我的情况下，我的输入是二维空间坐标，输出是一些转换版本以及二维空间坐标。我生成了一些虚拟测试数据并尝试为其拟合 GP 模型。我使用的代码如下：

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
import numpy as np

# Some dummy data
X = np.random.rand(10, 2)
Y = np.sin(X)

# Use the squared exponential kernel
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, Y)
print(X)
# Evaluate on a test point
test = np.random.rand(1, 2)
test[:, 0] = 1.56
test[:, 1] = 0.92
y_pred, sigma = gp.predict(test, return_std=True)
print(test, np.sin(test))  # The true value
print(y_pred, sigma)  # The predicted value and the STD

我想知道是否有一种可视化模型拟合的好方法。由于我的输入和输出维度都是二维的，我不确定如何快速将其可视化，以便了解模型拟合（特别是想知道模型预测点之间的平滑度和方差）。当然，大多数在线示例都是针对一维案例的。

【问题讨论】：

标签： python matplotlib scikit-learn data-visualization

【解决方案1】：

我假设您需要的是Principal Component Analysis(PCA)，它是一种统计技术，可以减少数据集的维度，同时保留它们在高维度与低维度之间的差异。

在python中：

from sklearn.decomposition import PCA

pca_x=PCA(n_components=1)
X1D=pca.fit_transform(X)

pca_y=PCA(n_components=1)
y1D=pca.fit_transform(y)

plt.plot(X1D,y1D)

n_components=d 其中 d 是所需的缩减维度

链接到 sklearn 中的 PCA -->here

另一种方法是t-distributed Stochastic Neighbor Embedding，简而言之t-sne，它也用于可视化高维数据，查找python实现here p>

【讨论】：

这是个好主意。我要试试这个。谢谢你。我想我可以在很多点对函数进行采样，然后进行 PCA。我会试一试。
@Luca 我还添加了另一个解决方案t-sne，看看