【发布时间】:2012-01-10 11:27:13
【问题描述】:
我在 Python/Scipy 中处理相当大的矩阵。我需要从大矩阵(加载到 coo_matrix)中提取行并将它们用作对角线元素。目前我以以下方式这样做:
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A[i,:].todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csc")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
我从profile 输出中看到的是,大部分时间都被get_csr_submatrix 函数所消耗,同时提取diag_elems。这让我觉得我要么使用了初始数据的低效稀疏表示,要么使用了从稀疏矩阵中提取行的错误方法。您能否提出一种更好的方法来从稀疏矩阵中提取一行并以对角线形式表示它?
编辑
以下变体消除了行提取的瓶颈(请注意,简单地将'csc' 更改为csr 是不够的,A[i,:] 也必须替换为A.getrow(i))。然而,主要问题是如何省略具体化 (.todense()) 并从行的稀疏表示创建对角矩阵。
import numpy as np
from scipy import sparse
def computation(A):
for i in range(A.shape[0]):
diag_elems = np.array(A.getrow(i).todense())
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1], format = "csc")
#...
#create some random matrix
A = (sparse.rand(1000,100000,0.02,format="csr")*5).astype(np.ubyte)
#get timings
profile.run('computation(A)')
如果我直接从 1 行 CSR 矩阵创建 DIAgonal 矩阵,如下:
diag_elems = A.getrow(i)
ith_diag = sparse.spdiags(diag_elems,0,A.shape[1],A.shape[1])
那么我既不能指定format="csc" 参数,也不能将ith_diags 转换为CSC 格式:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/profile.py", line 70, in run
prof = prof.run(statement)
File "/usr/local/lib/python2.6/profile.py", line 456, in run
return self.runctx(cmd, dict, dict)
File "/usr/local/lib/python2.6/profile.py", line 462, in runctx
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "<stdin>", line 4, in computation
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/construct.py", line 56, in spdiags
return dia_matrix((data, diags), shape=(m,n)).asformat(format)
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/base.py", line 211, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/dia.py", line 173, in tocsc
return self.tocoo().tocsc()
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/coo.py", line 263, in tocsc
data = np.empty(self.nnz, dtype=upcast(self.dtype))
File "/usr/local/lib/python2.6/site-packages/scipy/sparse/sputils.py", line 47, in upcast
raise TypeError,'no supported conversion for types: %s' % args
TypeError: no supported conversion for types: object`
【问题讨论】:
-
你试过
format="csr"吗? -
使用 'csr' 作为初始数据并将
A[i,:]替换为A.getrow(i)我实现了显着的加速。但我正在寻找的是在创建对角矩阵之前忽略行的具体化。有什么想法吗?
标签: python numpy scipy sparse-matrix