高效地计算矩阵中所有行的运算答案

【问题标题】：Compute operations of all rows in matrix with itself efficiently高效地计算矩阵中所有行的运算
【发布时间】：2021-05-12 19:07:55
【问题描述】：

假设我有一个任意大小 (n, m) 的二维矩阵 M。现在我想有效地对 M 中的所有行与 M 中的所有其他行进行一些向量操作。就我而言，我想在每一行之间计算按位异或。

我们所知道的是没有必要计算 xnor(x, y) 和 xnor(y, x)，计算其中之一就足够了。此外，就我而言，不需要计算 xnor(x, x)。我知道这个问题的两种解决方案：

def xnor(p, q):
    return abs(p-1) * abs(q-1) + (p * q)

def solution_one(M):
    return xor(M[:, None], M).all(axis=2)

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of 
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

用我们得到的一些任意大矩阵来运行这两者，得到解决方案一

# 100x100
%timeit solution_one(M)
# 3.55 ms ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# 500x500
%timeit solution_one(M)
# 1.36 s ± 59.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_one(M)
# 27.4 s ± 603 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

解决方案二

# 100x100
%timeit solution_two(M)
# 29.5 ms ± 535 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 500x500
%timeit solution_two(M)
# 1.06 s ± 41.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_two(M)
# 5.58 s ± 95.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

如您所见，对于较小的实例，solution_one 比 solution_two 快，但对于较大的实例则相反。

我们可以看到solution_one 正在执行n x n 计算，而solution_two 将执行（正如 Naphat Amundsen 在 cmets 中提到的）((n x n) - n) / 2。有更好的解决方案吗？是否可以以某种方式构造一个新的 M 矩阵，使得解决方案更接近((n x n) - n) / 2 计算？

【问题讨论】：

您好，这是一个有趣的问题。正如你所说，我有点想知道solution_two 是如何完成超过一半的(n x n) 的。不是在做((n x n) - n) / 2吗？
是的，您是对的，感谢您的关注！我会更新我的问题。当我应该做triu时，我做了tril

标签： python numpy matrix

【解决方案1】：

我相信在这种情况下您可以进行的最少计算次数是((n x n) - n) / 2。如果是这种情况，那么就复杂性而言，您的 solution_two 已经是最佳选择。但是，我相信你能让solution_two 更加矢量化，这可能会提高运行时性能。这是我的尝试：

import numpy as np

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

def solution_two_vectorized(M):
    n = M.shape[0]
    results = np.zeros((n, n))
    ii, jj = np.triu_indices(n, 1)
    results[ii,jj] = xnor(M[ii], M[jj]).all(-1)
    return results

# Quick test checking that your outputs are same as mine on some random matrices
import time
# Generator of random test matrices
Ms = (np.random.randint(-10,10,(np.random.randint(100,200),np.random.randint(100,200))) for i in
      range(100))

vectortimes = []
loopytimes = []

for M in Ms:
    t0 = time.time()
    out_vectorized = solution_two_vectorized(M)
    vectortimes.append(time.time() - t0)

    t0 = time.time()
    out_loopy = solution_two(M)
    loopytimes.append(time.time() - t0)

    if not np.allclose(out_vectorized, out_loopy):
        print("Output mismatch!")
        break
else:
    print("Success!")
# Success!

mean_vectortimes = np.mean(vectortimes)
mean_loopytimes = np.mean(loopytimes)

print("Vector times: ", mean_vectortimes)
# Vector times:  0.017428524494171142
print("Loopy times: ", mean_loopytimes)
# Loopy times:  0.07040918111801148
print(f"Performance improvement: {mean_loopytimes/mean_vectortimes} times")
# Performance improvement: 4.039881926984661 times

【讨论】：

很好，谢谢！是的，这感觉就像介于两者之间。然而，对于较大的矩阵（≥ 500x500），它实际上比早期的两种解决方案都慢。从长远来看，时间索引可能比循环和计算更糟糕？
好吧，这很不幸。我现在自己检查了一下，我同意，对于大输入来说确实比较慢。花哨的索引：M[ii] 和 M[jj] 创建副本而不是视图，因此您最终会进行大量复制。像您一样使用 for 循环可以避免上述副本，因为您只进行基本索引。我真的没有想到这一点。我可以建议使用numba jit 编译solution_two 函数吗？
使用numba 是一个巨大的改进。对于每个解决方案，几乎是一半的时间，现在矢量化的解决方案比循环解决方案更快
真的吗？您应该提交您的结果作为答案！我（可能还有其他人）很想看到他们:)