您可以避免d2m 矩阵创建和x_smf 向量计算的循环,使用sps.diags 创建一个稀疏三对角矩阵,您可以将其转换为数组以便能够编辑第一行和最后一行。您的代码将如下所示(请注意,第 10 行中diags 的结果已使用scipy.sparse.dia_matrix.toarray 方法转换为密集的ndarray):
import numpy as np
import scipy.sparse as sps
# Dense tridiagonal matrix
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2
Valdi_Bo 提出的解决方案可以让你去掉第二个 FOR 循环:
x_smf = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
但是,我想引起您的注意,x_smf 矩阵是稀疏的,并且将其存储为密集的 ndarray 对计算时间和内存存储都不利。我建议您不要转换为密集的 ndarray,而是转换为 sparse matrix format。例如lil_matrix,这是一个列表稀疏矩阵格式的列表,使用 tolil() 方法而不是 toarray():
# Sparse tridiagonal matrix
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
这是一个脚本,它在更大的情况下比较了三个实现num=4000(对于num=5,都给出324)。对于这种大小,我已经看到了使用稀疏矩阵的好处,这是整个脚本(第一行是num 与5 不同的代码的概括):
from time import time
import numpy as np
import scipy.sparse as sps
num = 4000
A = np.concatenate([np.arange(1, (num-2)*num+1).reshape(num, num-2), np.repeat([[4, 5]], num, axis=0)], axis=1)
sm_factor = 0.1*np.ones(num)
########## DENSE matrix + FOR loop ##########
d2m = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).toarray() # cast to array
# First line boundary conditions
d2m[0, 0] = 2
d2m[0, 1] = -5
d2m[0, 2] = 4
d2m[0, 3] = -1
# Last line boundary conditions
d2m[num-1, num-4] = -1
d2m[num-1, num-3] = 4
d2m[num-1, num-2] = -5
d2m[num-1, num-1] = 2
# FOR loop version
t_start = time()
x_smf = 0
for i in range(len(sm_factor)):
x_smf = x_smf + sm_factor[i] * (d2m @ (A[i, :]).T).T @ (d2m @ (A[i, :]).T)
print(f'FOR loop version time: {time()-t_start}s')
print(f'FOR loop version value: {x_smf}\n')
########## DENSE matrix + VECTORIZED ##########
t_start = time()
x_smf_v = np.sum(sm_factor * np.square(d2m @ A.T).sum(axis=0))
print(f'VECTORIZED version time: {time()-t_start}s')
print(f'VECTORIZED version value: {x_smf_v}\n')
########## SPARSE matrix + VECTORIZED ##########
d2m_s = sps.diags([1, -2, 1], [-1, 0, 1], shape=(num, num)).tolil() # cast to lil
# First line boundary conditions
d2m_s[0, 0] = 2
d2m_s[0, 1] = -5
d2m_s[0, 2] = 4
d2m_s[0, 3] = -1
# Last line boundary conditions
d2m_s[num-1, num-4] = -1
d2m_s[num-1, num-3] = 4
d2m_s[num-1, num-2] = -5
d2m_s[num-1, num-1] = 2
t_start = time()
x_smf_s = np.sum(sm_factor * np.square(d2m_s @ A.T).sum(axis=0))
print(f'SPARSE+VECTORIZED version time: {time()-t_start}s')
print(f'SPARSE+VECTORIZED version value: {x_smf_s}\n')
这是我在运行代码时得到的:
FOR loop version time: 25.878241777420044s
FOR loop version value: 3.752317536763356e+17
VECTORIZED version time: 1.0873610973358154s
VECTORIZED version value: 3.752317536763356e+17
SPARSE+VECTORIZED version time: 0.37279224395751953s
SPARSE+VECTORIZED version value: 3.752317536763356e+17
如您所见,使用稀疏矩阵可以让您在计算时间上获得另一个因素 3,并且不需要您调整之后的代码。测试稀疏矩阵的各种 scipy 实现(tocsc()、tocsr()、todok() 等)也是一个很好的策略,其中一些可能更适合您的情况。