【发布时间】:2020-05-09 23:39:10
【问题描述】:
我认为特征向量必须相互正交。以下似乎违反了这一点。我想检查我是否做错了什么。感谢您的任何见解!!!
这是 PCA 的代码(帖子底部的数据)
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
#calculate the mean of each column
M = mean(df.T, axis=1)
# center columns by subtracting column means
C = df - M
# calculate covariance matrix of centered matrix
V = cov(df.T)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
# project data
P = vectors.T.dot(C.T)
#Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(values[i]), vectors[:,i]) for i in range(len(values))]
# Sort the (eigenvalue, eigenvector) tuples from high to low
eig_pairs.sort(key=lambda x: x[0], reverse=True)
matrix_w = np.hstack((eig_pairs[0][1].reshape(20,1), eig_pairs[1][1].reshape(20,1)))
#print('Matrix W:\n', matrix_w)
我在这里绘制特征向量所做的只是抓取 matrix_w 的前两行。它是否正确?我只是手动将它们输入到数组 M 中。是我的 matrix_w 错误还是前两个主成分的向量选择不正确?
M = np.array([[0.00747255, 0.16222854],[-0.18394907, 0.12426324]])
rows,cols = M.T.shape
#Get absolute maxes for axis ranges to center origin
maxes = 1.1*np.amax(abs(M), axis = 0)
for i,l in enumerate(range(0,cols)):
xs = [0,M[i,0]]
ys = [0,M[i,1]]
plt.plot(xs,ys)
plt.plot(0,0,'ok') #<-- plot a black point at the origin
plt.axis('equal') #<-- set the axes to the same scale
plt.legend(['V'+str(i+1) for i in range(cols)]) #<-- give a legend
plt.grid(b=True, which='major') #<-- plot grid lines
plt.show()```
这是绘制的向量的样子,但它们不是正交的。
这里是数据(已经规范化了 np.log):
[[1.954242509439325,
1.6901960800285136,
1.9444826721501687,
1.2787536009528289,
1.7558748556724915,
1.7075701760979363,
1.2787536009528289,
1.3222192947339193,
1.4313637641589874,
1.3222192947339193,
1.9084850188786497,
1.8750612633917,
1.6434526764861874,
1.8512583487190752,
1.3424226808222062,
1.9590413923210936,
1.9294189257142926,
1.8692317197309762,
1.4771212547196624,
1.414973347970818],
[1.9138138523837167,
1.0,
1.7781512503836436,
0.3010299956639812,
1.7403626894942439,
1.6127838567197355,
0.47712125471966244,
0.3010299956639812,
0.6020599913279624,
0.3010299956639812,
1.8260748027008264,
1.8512583487190752,
0.9542425094393249,
1.662757831681574,
1.9030899869919435,
1.8195439355418688,
1.380211241711606,
1.9731278535996986,
0.6989700043360189,
1.255272505103306],
[1.9444826721501687,
1.6232492903979006,
1.7993405494535817,
0.6020599913279624,
1.8808135922807914,
1.724275869600789,
1.0413926851582251,
1.3617278360175928,
1.0413926851582251,
0.6989700043360189,
1.9395192526186185,
1.9242792860618816,
1.6020599913279623,
1.6532125137753437,
1.9444826721501687,
1.9731278535996986,
1.6720978579357175,
1.5563025007672873,
1.7558748556724915,
0.47712125471966244],
[1.9822712330395684,
1.792391689498254,
1.9912260756924949,
1.505149978319906,
1.792391689498254,
1.8260748027008264,
1.6334684555795864,
0.8450980400142568,
1.146128035678238,
1.146128035678238,
1.919078092376074,
1.9493900066449128,
1.7853298350107671,
1.9084850188786497,
1.1760912590556813,
1.4913616938342726,
1.9867717342662448,
1.1139433523068367,
1.724275869600789,
1.1760912590556813],
[1.9731278535996986,
1.5797835966168101,
1.6812412373755872,
1.0413926851582251,
1.8692317197309762,
1.568201724066995,
1.3617278360175928,
0.9542425094393249,
1.1139433523068367,
1.0791812460476249,
1.8808135922807914,
1.8808135922807914,
1.6232492903979006,
1.7558748556724915,
1.462397997898956,
1.9242792860618816,
1.9030899869919435,
1.919078092376074,
1.3010299956639813,
0.6989700043360189],
[1.9867717342662448,
1.7853298350107671,
1.9344984512435677,
1.4471580313422192,
1.8976270912904414,
1.863322860120456,
1.0791812460476249,
0.8450980400142568,
1.414973347970818,
1.3617278360175928,
1.9294189257142926,
1.9731278535996986,
1.919078092376074,
1.3010299956639813,
1.9590413923210936,
1.9731278535996986,
1.9731278535996986,
1.9242792860618816,
1.4913616938342726,
1.380211241711606],
[1.4313637641589874,
1.9344984512435677,
1.99563519459755,
1.3424226808222062,
1.9590413923210936,
1.7403626894942439,
1.8808135922807914,
1.2304489213782739,
1.3010299956639813,
1.380211241711606,
1.8808135922807914,
1.8325089127062364,
1.9493900066449128,
1.9590413923210936,
1.0413926851582251,
1.9777236052888478,
1.9731278535996986,
1.7558748556724915,
1.0413926851582251,
1.4471580313422192],
[1.8573324964312685,
1.414973347970818,
1.8864907251724818,
0.3010299956639812,
1.3424226808222062,
1.5314789170422551,
0.0,
0.6989700043360189,
1.3010299956639813,
0.47712125471966244,
1.3424226808222062,
1.7075701760979363,
0.9030899869919435,
1.2041199826559248,
1.9493900066449128,
1.8129133566428555,
1.8920946026904804,
1.9637878273455553,
0.7781512503836436,
0.9542425094393249],
[1.7403626894942439,
1.4913616938342726,
1.7853298350107671,
1.1760912590556813,
1.462397997898956,
1.5185139398778875,
0.0,
0.6989700043360189,
1.1760912590556813,
1.0413926851582251,
1.6901960800285136,
1.6232492903979006,
1.146128035678238,
1.6127838567197355,
1.7075701760979363,
1.7075701760979363,
1.8573324964312685,
1.4471580313422192,
1.1139433523068367,
1.0413926851582251],
[1.863322860120456,
1.8573324964312685,
1.9294189257142926,
1.3979400086720377,
1.4913616938342726,
1.8388490907372552,
1.0,
1.2304489213782739,
1.2787536009528289,
1.1760912590556813,
1.8976270912904414,
1.845098040014257,
1.662757831681574,
1.7853298350107671,
1.806179973983887,
1.9138138523837167,
1.6812412373755872,
1.7853298350107671,
1.6812412373755872,
1.4771212547196624],
[1.9822712330395684,
1.2304489213782739,
1.9637878273455553,
1.5440680443502757,
1.8195439355418688,
1.505149978319906,
1.2304489213782739,
1.0413926851582251,
1.7075701760979363,
1.6232492903979006,
1.9084850188786497,
1.8573324964312685,
1.6989700043360187,
1.806179973983887,
1.0413926851582251,
1.9637878273455553,
1.9590413923210936,
1.4771212547196624,
1.0413926851582251,
1.5314789170422551],
[1.9637878273455553,
1.2304489213782739,
1.919078092376074,
1.1139433523068367,
1.792391689498254,
1.7075701760979363,
0.6020599913279624,
1.2304489213782739,
1.4771212547196624,
1.1760912590556813,
1.7853298350107671,
1.8573324964312685,
1.5314789170422551,
1.7075701760979363,
1.0413926851582251,
1.7993405494535817,
1.9731278535996986,
1.4471580313422192,
0.3010299956639812,
1.792391689498254],
[1.4771212547196624,
1.7160033436347992,
1.99563519459755,
1.0413926851582251,
1.9030899869919435,
1.8750612633917,
1.255272505103306,
0.3010299956639812,
0.6989700043360189,
0.47712125471966244,
1.7558748556724915,
1.7160033436347992,
1.662757831681574,
1.9493900066449128,
0.6989700043360189,
1.9867717342662448,
1.3979400086720377,
1.4913616938342726,
0.47712125471966244,
0.9542425094393249]]
df = pd.DataFrame(data, columns=['Real coffee', 'Instant coffee', 'Tea', 'Sweetener', 'Biscuits',
'Powder soup', 'Tin soup', 'Potatoes', 'Frozen fish', 'Frozen veggies',
'Apples', 'Oranges', 'Tinned fruit', 'Jam', 'Garlic', 'Butter',
'Margarine', 'Olive oil', 'Yoghurt', 'Crisp bread'])
【问题讨论】:
-
特征向量没有任何约束,即它们必须是正交的。相关矩阵的特征向量应该是正交的。很难按照你的排序,你为什么不使用
np.dot(vectors[:, col_i], vectors[:, col_j])检查所有vectors对的正交性。如果它们是正交的,则对于所有 i 和 j(i==j 除外),此点积应为 0。 -
考虑改为这样排序:
order = np.argsort(values),matrix_w = vectors[:, order] -
还有
vectors的形状是什么?除非它是 2×2,否则看起来你已经剪裁了向量,所以它们当然不再是正交的,你只是将它们从(我假设)20D 投影到 2D -
@Dan 向量的形状是 (20,20)。我不明白如何使用 np.dot 检查正交性 - 我需要做一个循环吗?我可以做类似 matrix_w.dot(matrix_w.T) 的事情吗
-
你可以使用循环。否则,我认为
vectors @ vectors.T可能会有效地对每对进行成对点积(只需看下面的三角形)。您的正交性在 20D 中,当您投影到 2D 时,它没有理由保持正交。想想当您将 3D 轴投影到 2D 时会发生什么(就像您见过的每个 3D 图表一样),z 轴不再与 x 或 y 正交。这基本上就是你正在做的事情。
标签: python numpy pca eigenvector orthogonal