【发布时间】:2022-02-21 01:44:21
【问题描述】:
我想在数组转置的基础上做加法和乘法。
鉴于A 是一个数组。
sum += p(A) * factor(p)
其中p 是一个置换/转置,factor(p) 是一个前置因子列表。比如A是一个二维数组,目标是
0.1*A + 0.1*transpose(A,(1,0))
我在我的Python 代码中发现,当应用更高维排列时,数组加法的时间远大于转置。也许numpy.transpose 使用C 中的指针。我想知道,有没有什么方法可以优化数组加法部分的时序? numpy.add 没有多大帮助。我应该以某种方式只对影响数组的转置部分求和,其余部分使用乘法吗?例如,排列(0,1,3,2),(0,1) 部分在前一个数组的顶部乘以一个公因子。或者使用cpython来提升性能?
这是我的Python 代码
import numpy as np
import time
import itertools as it
ref_list = [0, 1, 2, 3, 4]
p = it.permutations(ref_list)
transpose_list = tuple(p)
print(type(transpose_list),type(transpose_list[0]),transpose_list[0])
n_loop = 2
na = nb = nc = nd = ne = 20
A = np.random.random((na,nb,nc,nd,ne))
sum_A = np.zeros((na,nb,nc,nd,ne))
factor_list = [i*0.1 for i in range(120)]
time_transpose = 0
time_add = 0
time_multiply = 0
for n in range(n_loop):
for m, t in enumerate(transpose_list):
start = time.time()
B = np.transpose(A, transpose_list[m] )
finish = time.time()
time_transpose += finish - start
start = time.time()
B_p = B * factor_list[m]
finish = time.time()
time_multiply += finish - start
start = time.time()
sum_A += B_p
finish = time.time()
time_add += finish - start
print(time_transpose, time_multiply, time_add, time_multiply/time_transpose, time_add/time_transpose)
输出是
0.004961967468261719 1.1218750476837158 3.7830252647399902 226.09480107630213 762.404285988852
加法时间比转置大约大 700 倍。
我尝试在How to avoid huge overhead of single-threaded NumPy's transpose?中使用numba的转置
通过添加
import numba as nb
@nb.njit('void(float64[:,::1], float64[:,::1])', parallel=True)
def transpose(mat, out):
blockSize, tileSize = 256, 32 # To be tuned
n, m = mat.shape
assert blockSize % tileSize == 0
for tmp in nb.prange((m+blockSize-1)//blockSize):
i = tmp * blockSize
for j in range(0, n, blockSize):
tiMin, tiMax = i, min(i+blockSize, m)
tjMin, tjMax = j, min(j+blockSize, n)
for ti in range(tiMin, tiMax, tileSize):
for tj in range(tjMin, tjMax, tileSize):
out[ti:ti+tileSize, tj:tj+tileSize] = mat[tj:tj+tileSize, ti:ti+tileSize].T
并使用
B = transpose(A, transpose_list[m] )
收到
Traceback (most recent call last):
File "transpose_test_v2.py", line 46, in <module>
B = transpose(A, transpose_list[m] )
File "/home/.../lib/python3.8/site-packages/numba/core/dispatcher.py", line 717, in _explain_matching_error
raise TypeError(msg)
TypeError: No matching definition for argument type(s) array(float64, 6d, C), UniTuple(int64 x 6)
或使用
B = nb.transpose(A, transpose_list[m] )
并得到了
B = nb.transpose(A, transpose_list[m] )
AttributeError: 'int' object has no attribute 'transpose'
【问题讨论】:
标签: python numpy optimization transpose