您可以使用np.ravel_multi_index 将元素作为二维索引将这些行转换为一维数组。然后,使用np.unique 为我们提供每个唯一行的开始位置,并且还有一个可选参数return_counts 为我们提供计数。因此,实现看起来像这样 -
def unique_rows_counts(a):
# Calculate linear indices using rows from a
lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
# Get the unique indices and their counts
_, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
# return the unique groups from a and their respective counts
return a[unq_idx], counts
示例运行 -
In [64]: a
Out[64]:
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2],
[3, 3, 0],
[3, 3, 0],
[3, 3, 0]])
In [65]: unqrows, counts = unique_rows_counts(a)
In [66]: unqrows
Out[66]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])
基准测试
假设您可以使用 numpy 数组或集合作为输出,可以对目前提供的解决方案进行基准测试,就像这样 -
函数定义:
import numpy as np
from collections import Counter
def unique_rows_counts(a):
lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
_, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
return a[unq_idx], counts
def map_Counter(a):
return Counter(map(tuple, a))
def forloop_Counter(a):
c = Counter()
for x in a:
c[tuple(x)] += 1
return c
时间安排:
In [53]: a = np.random.randint(0,4,(10000,5))
In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop
In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop
In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop