Python：在将每一行与矩阵中的每一行进行比较后，存储非零唯一行的索引答案

【问题标题】：Python: Store indices of non-zero unique rows after comparing each rows with every other row in a matrixPython：在将每一行与矩阵中的每一行进行比较后，存储非零唯一行的索引
【发布时间】：2018-04-16 12:29:04
【问题描述】：

对于这个矩阵 K=

 [[-1.  1.  0.]
 [ 0.  0.  0.]
 [ 0. -1.  1.]
 [ 0.  0.  0.]
 [ 0. -1.  1.]
 [ 0.  0.  0.]]

任务是将非零唯一行的索引存储在数组中（这里的答案是 {0,2}），这样

K([0,2],:)

可用于线性代数运算。我的尝试是：

myList = []    
for i in range(len(K)): #generate pairs
    for j in range(i+1,len(K)):  #travel down each other rows
        if np.array_equal(K[i],K[j]) and np.any(K[i] != 0, axis=1) and np.any(K[j] != 0, axis=1):
        myList.append(K[i])
        print ('indices of similar-non-zeros rows are\n',(i, j)),
        elif not np.array_equal(K[i],K[j]) and np.any(K[i] != 0,axis=1) and np.any(K[j] != 0, axis=1): 
        myList.append(K[i])
        print ('indices of non-similar-non-zeros rows are\n',(i, j)),
        else: 
            continue

new_K = np.asmatrix(np.asarray(myList))
new_new_K = np.unique(new_K,axis=0)
print('Now K is \n',new_new_K)

答案是：

    new_new_K = [[-1.  1.  0.]
                 [ 0. -1.  1.]]

问题1：如何以pythonic方式进行。以上是矩阵存储限制的替代解决方案，但将索引存储在数组中更为可取。

【问题讨论】：

您寻求更好解决方案的主要动机是什么？性能或可读性（或两者兼而有之）？
感谢您的回复@jpp，可读性和性能都很重要，因为我必须对一个非常大的矩阵 K 使用相同的想法，该矩阵在优化迭代后会更新。
另外，如果可能的话，很高兴看到一些基准测试。
发布的解决方案是否对您有用？
@Divakar 谢谢。这两种解决方案都适用于这种小矩阵情况，但我仍在努力寻找性能更优越的解决方案。

标签： python arrays python-3.x numpy matrix

【解决方案1】：

您可以为此使用简单的for 循环和enumerate。

import numpy as np

A = np.array([[-1,  1,  0],
              [ 0,  0,  0],
              [ 0, -1,  1],
              [ 0,  0,  0],
              [ 0, -1,  1],
              [ 0,  0,  0]])

seen = {(0, 0, 0)}
res = []

for idx, row in enumerate(map(tuple, A)):
    if row not in seen:
        res.append(idx)
        seen.add(row)

结果

print(A[res])

[[-1  1  0]
 [ 0 -1  1]]

示例 #2

import numpy as np

A=np.array([[0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 1],
            [0, 1, 0, 0, 0, 1],
            [1, 0, 1, 0, 1, 1],
            [1, 1, 1, 0, 0, 0],
            [0, 1, 0, 1, 0, 1],
            [0, 0, 0, 0, 0, 0]])

seen={(0, )*6}

res = []

for idx, row in enumerate(map(tuple, A)):
    if row not in seen:
        res.append(idx)
        seen.add(row)

print(A[res])

# [[0 1 0 0 0 1]
#  [0 0 0 1 0 1]
#  [1 0 1 0 1 1]
#  [1 1 1 0 0 0]
#  [0 1 0 1 0 1]]

【讨论】：

【解决方案2】：

您可以使用np.unique 及其axis 参数来获取起始唯一行索引，然后过滤掉对应行全为零的唯一行索引，就像这样 -

def unq_row_indices_wozeros(a):
    # Get unique rows and their first occuring indices
    unq, idx = np.unique(a, axis=0, return_index=1)

    # Filter out the index, the corresponding row of which is ALL 0s
    return idx[(unq!=0).any(1)]

示例运行 -

In [53]: # Setup input array with few all zero rows and duplicates
    ...: np.random.seed(0)
    ...: a = np.random.randint(0,9,(10,3))
    ...: a[[2,5,7]] = 0
    ...: a[4] = a[1]
    ...: a[8] = a[3]

In [54]: a
Out[54]: 
array([[5, 0, 3],
       [3, 7, 3],
       [0, 0, 0],
       [7, 6, 8],
       [3, 7, 3],
       [0, 0, 0],
       [1, 5, 8],
       [0, 0, 0],
       [7, 6, 8],
       [2, 3, 8]])

In [55]: unq_row_indices_wozeros(a)
Out[55]: array([6, 9, 1, 0, 3])

# Sort those indices if needed
In [56]: np.sort(unq_row_indices_wozeros(a))
Out[56]: array([0, 1, 3, 6, 9])

【讨论】：