用 numpy 同时对角化矩阵答案

【问题标题】：Simultaneously diagonalize matrices with numpy用 numpy 同时对角化矩阵
【发布时间】：2026-01-05 21:15:02
【问题描述】：

我有一个 m × n × n numpy.ndarray 的 m 个同时可对角化的方阵，并且会喜欢用numpy来获取它们的同时特征值。

例如，如果我有

from numpy import einsum, diag, array, linalg, random
U = linalg.svd(random.random((3,3)))[2]

M = einsum(
    "ij, ajk, lk",
    U, [diag([2,2,0]), diag([1,-1,1])], U)

M中的两个矩阵是同时可对角化的，我正在寻找获取数组的方法

array([[2.,  1.],
       [2., -1.],
       [0.,  1.]])

（直到行的排列）来自M。有没有内置的或简单的方法来获得这个？

【问题讨论】：

我认为要走的路是math.stackexchange.com/questions/12220/… 中的答案，尽管我仍然很欣赏这些步骤的良好实施。

标签： python numpy eigenvalue diagonal

【解决方案1】：

Cardoso 和 Soulomiac 在 1996 年发表了一种基于 Givens 旋转的相当简单且非常优雅的同步对角化算法：

Cardoso, J. 和 Souloumiac, A. (1996)。用于同时对角化的雅可比角。 SIAM 矩阵分析与应用杂志，17（1），161–164。 doi:10.1137/S0895479893259546

我在此响应的末尾附上了算法的 numpy 实现。警告：事实证明，同时对角化是一个有点棘手的数值问题，没有算法（据我所知）可以保证全局收敛。然而，它不起作用的情况（见论文）是退化的，实际上我从来没有让雅可比角算法失败过。

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Routines for simultaneous diagonalization
Arun Chaganty <arunchaganty@gmail.com>
"""

import numpy as np
from numpy import zeros, eye, diag
from numpy.linalg import norm

def givens_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 

    return A

def givens_double_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 
    A_i, A_j = A[:,i], A[:,j]
    A[:,i], A[:,j] = c * A_i + s * A_j, c * A_j - s * A_i 

    return A

def jacobi_angles( *Ms, **kwargs ):
    r"""
    Simultaneously diagonalize using Jacobi angles
    @article{SC-siam,
       HTML =   "ftp://sig.enst.fr/pub/jfc/Papers/siam_note.ps.gz",
       author = "Jean-Fran\c{c}ois Cardoso and Antoine Souloumiac",
       journal = "{SIAM} J. Mat. Anal. Appl.",
       title = "Jacobi angles for simultaneous diagonalization",
       pages = "161--164",
       volume = "17",
       number = "1",
       month = jan,
       year = {1995}}

    (a) Compute Givens rotations for every pair of indices (i,j) i < j 
            - from eigenvectors of G = gg'; g = A_ij - A_ji, A_ij + A_ji
            - Compute c, s as \sqrt{x+r/2r}, y/\sqrt{2r(x+r)}
    (b) Update matrices by multiplying by the givens rotation R(i,j,c,s)
    (c) Repeat (a) until stopping criterion: sin theta < threshold for all ij pairs
    """

    assert len(Ms) > 0
    m, n = Ms[0].shape
    assert m == n

    sweeps = kwargs.get('sweeps', 500)
    threshold = kwargs.get('eps', 1e-8)
    rank = kwargs.get('rank', m)

    R = eye(m)

    for _ in xrange(sweeps):
        done = True
        for i in xrange(rank):
            for j in xrange(i+1, m):
                G = zeros((2,2))
                for M in Ms:
                    g = np.array([ M[i,i] - M[j,j], M[i,j] + M[j,i] ])
                    G += np.outer(g,g) / len(Ms)
                # Compute the eigenvector directly
                t_on, t_off = G[0,0] - G[1,1], G[0,1] + G[1,0]
                theta = 0.5 * np.arctan2( t_off, t_on + np.sqrt( t_on*t_on + t_off * t_off) )
                c, s = np.cos(theta), np.sin(theta)

                if abs(s) > threshold:
                    done = False
                    # Update the matrices and V
                    for M in Ms:
                        givens_double_rotate(M, i, j, c, s)
                        #assert M[i,i] > M[j, j]
                    R = givens_rotate(R, i, j, c, s)

        if done:
            break
    R = R.T

    L = np.zeros((m, len(Ms)))
    err = 0
    for i, M in enumerate(Ms):
        # The off-diagonal elements of M should be 0
        L[:,i] = diag(M)
        err += norm(M - diag(diag(M)))

    return R, L, err

【讨论】：

我试过这个程序但是结果不对，我觉得应该是-t_on + np.sqrt(t_on * t_on + t_off * t_off) 在计算theta的那一行，G是a对称矩阵[[a, b],[b, c]]，最大特征值lambda为(a+c+sqrt((ac) ** 2+4 * b ** 2)/2，令特征向量对应这个特征值是 [[1],[t]] （没有归一化），然后 a + b*t = lambda, t = (lambda - a)/b = (-a+c+sqrt((ac) ** 2 +4 * b ** 2)/2/b = (-t_on + sqrt(t_on ** 2 + t_off **2)/t_off。您可以打印出每个外循环中非对角线元素的平方和以查看如果它单调减少
我还发现使用 arctan2 可能存在数值问题并且不会收敛。我改用 x, y = 1/(1+t ** 2), t/(1+t ** 2), c = sqrt(0.5 + x/2), s = y/2/c。
修正：x, y = 1/sqrt(1 + t ** 2), t/sqrt(1 + t ** 2)
-1 此代码不正确。 例如，它无法同时对单位矩阵 {{1, 0}, {0, 1}} 和 X矩阵 {{0, 1}, {1, 0}}。它应该返回一个 R，例如 {{1, 1}, {1, -1}}/sqrt(2)，但它会返回一个 R 的单位矩阵。

【解决方案2】：

我不知道有任何直接的解决方案。但是为什么不直接获取第一个矩阵的特征值和特征向量，并使用特征向量将所有其他矩阵转换为对角线形式呢？比如：

eigvals, eigvecs = np.linalg.eig(matrix1)
eigvals2 = np.diagonal(np.dot(np.dot(transpose(eigvecs), matrix2), eigvecs))

如果您愿意，可以通过hstack 将列添加到数组中。

更新：如下所述，这仅在没有退化特征值发生时才有效。否则必须先检查退化特征值，然后将第二个矩阵转换为 blockdiagonal 形式，并分别对大于 1x1 的最终块进行对角化。

【讨论】：

因为这些特征向量不一定是第二个矩阵的特征向量，所以对角线可能不是第二个矩阵的特征值（！）。如果在我的问题 2 中的示例中是第一个矩阵的双特征值，我会得到该特征值的两个“随机”特征向量。这些不一定是第二个矩阵的 1 和 -1 的特征向量，它们是零集的情况。
我同意，在退化特征值的情况下，事情会稍微复杂一些，因为你最终会得到一个块对角矩阵。但是，您可以分别对那些大于 1x1 的块进行对角化。将整个第二个矩阵对角化会更快。
将部分对角化的矩阵分割成块然后对角化块比用 python 编写更容易描述，特别是考虑到数值不准确，这就是为什么我要求一个好的方法来做到这一点。跨度>

【解决方案3】：

如果您事先知道两个矩阵的特征值的大小，您可以对两个矩阵的线性组合进行对角化，并选择系数来打破退化。例如，如果两者的特征值都在 -10 和 10 之间，您可以对角化 100*M1 + M2。精度略有下降，但对于许多用途来说已经足够了——而且快速简单！

【讨论】：

【解决方案4】：

我确信我的解决方案还有很大的改进空间，但我提出了以下三个函数，以一种半稳健的方式为我进行计算。

def clusters(array,
             orig_indices = None,
             start = 0,
             rtol=numpy.allclose.__defaults__[0],
             atol=numpy.allclose.__defaults__[1]):
    """For an array, return a permutation that sorts the numbers and the sizes of the resulting blocks of identical numbers."""
    array = numpy.asarray(array)
    if not len(array):
        return numpy.array([]),[]
    if orig_indices is None:
        orig_indices = numpy.arange(len(array))
    x = array[0]
    close = abs(array-x) <= (atol + rtol*abs(x))
    first = sum(close)
    r_perm, r_sizes = clusters(
        array[~close],
        orig_indices[~close],
        start+first,
        rtol, atol)
    r_sizes.insert(0, first)
    return numpy.concatenate((orig_indices[close], r_perm)), r_sizes

def permutation_matrix(permutation, dtype=dtype):
    n = len(permutation)
    P = numpy.zeros((n,n), dtype)
    for i,j in enumerate(permutation):
        P[j,i]=1
    return P

def simultaneously_diagonalize(tensor, atol=numpy.allclose.__defaults__[1]):
    tensor = numpy.asarray(tensor)
    old_shape = tensor.shape
    size = old_shape[-1]
    tensor = tensor.reshape((-1, size, size))
    diag_mask = 1-numpy.eye(size)

    eigvalues, diagonalizer = numpy.linalg.eig(tensor[0])
    diagonalization = numpy.dot(
        numpy.dot(
            matrix.linalg.inv(diagonalizer),
            tensor).swapaxes(0,-2),
        diagonalizer)
    if numpy.allclose(diag_mask*diagonalization, 0):
        return diagonalization.diagonal(axis1=-2, axis2=-1).reshape(old_shape[:-1])
    else:
        perm, cluster_sizes = clusters(diagonalization[0].diagonal())
        perm_matrix = permutation_matrix(perm)
        diagonalization = numpy.dot(
            numpy.dot(
                perm_matrix.T,
                diagonalization).swapaxes(0,-2),
            perm_matrix)
        mask = 1-scipy.linalg.block_diag(
            *list(
                numpy.ones((blocksize, blocksize))
                for blocksize in cluster_sizes))
        print(diagonalization)
        assert(numpy.allclose(
                diagonalization*mask,
                0)) # Assert that the matrices are co-diagonalizable
        blocks = numpy.cumsum(cluster_sizes)
        start = 0
        other_part = []
        for block in blocks:
            other_part.append(
                simultaneously_diagonalize(
                    diagonalization[1:, start:block, start:block]))
            start = block
        return numpy.vstack(
            (diagonalization[0].diagonal(axis1=-2, axis2=-1),
             numpy.hstack(other_part)))

【讨论】：

这种方法仍然存在问题，我还无法确定。谨慎使用！