Python：更快地处理数组答案

【问题标题】：Python: process array fasterPython：更快地处理数组
【发布时间】：2013-08-15 14:18:17
【问题描述】：

我必须处理很多数组，它们包含 512x256 像素状数据，但是大多数条目是0，所以我只想保存非零值，即：

import numpy as np
import time

xlist=[]
ylist=[]
zlist=[]

millis = time.time()*1000
ar = np.zeros((512,256),dtype=np.uint16)

for x in range(0,512):
    for y in range(0,256):
        if (0<ar[x][y]<1000):
            xlist.append(x)
            ylist.append(y)
            zlist.append(ar[x][y])

print time.time()*1000-millis

这在我的电脑上大约需要 750 毫秒。有没有办法更快地做到这一点？我必须处理成千上万个这样的像素阵列。

【问题讨论】：

看起来您正在处理稀疏矩阵。 Scipy 为您提供了一些类类型供您选择：docs.scipy.org/doc/scipy/reference/sparse.html
一般来说，如果你可以避免在处理 numpy 数组时编写循环，你可以获得更快的性能。附带说明一下，如果这是 python 2.x，只需将 range 更改为 xrange 即可获得微小的性能提升。

标签： python arrays performance numpy

【解决方案1】：

你可以试试这样的：

ar = np.zeros((512,256),dtype=np.uint16)

# there should be something here to fill ar    

xs = np.arange(ar.shape[0])
ys = np.arange(ar.shape[1])

check = (0 < ar) & (ar < 1000)
ind = np.where( check )
xlist = xs[ ind[0] ]
ylist = ys[ ind[1] ] 
zlist = ar[ check ]

【讨论】：

是的，太棒了！谢谢！

【解决方案2】：

SciPy 为稀疏矩阵提供了很好的支持，应该可以很好地解决您的问题。查看 scipy.sparse 模块 here 的文档。

要将您的 numpy 数组转换为基于坐标 (COO) 的稀疏矩阵，就像您在上面的代码中所做的那样，您可以执行以下操作：

import numpy as np
from scipy import sparse

#your original matrix
A  = numpy.array([[1,0,3],[0,5,6],[0,0,9]])

#We let scipy create a sparse matrix from your data
sA = sparse.coo_matrix(A)

#The x,y,z

xlist,ylist,zlist = sA.row,sA.col,sA.data

print (xlist,ylist,zlist)

#This will print: (array([0, 0, 1, 1, 2], dtype=int32), array([0, 2, 1, 2, 2], dtype=int32), array([1, 3, 5, 6, 9]))

因为 scipy 代码通常是高度优化的，所以它应该比你的循环解决方案运行得更快（虽然我没有检查它）。

【讨论】：