方法#1
这是使用lex-sorting 和np.bincount 的一种方法-
# Perform lex sort and get the sorted array version of the input
sorted_idx = np.lexsort(A.T)
sorted_Ar = A[sorted_idx,:]
# Mask of start of each unique row in sorted array
mask = np.append(True,np.any(np.diff(sorted_Ar,axis=0),1))
# Get counts of each unique row
unq_count = np.bincount(mask.cumsum()-1)
# Compare counts to 1 and select the corresponding unique row with the mask
out = sorted_Ar[mask][np.nonzero(unq_count==1)[0]]
请注意,输出不会保持输入数组中最初存在的元素顺序。
方法 #2
如果元素是整数,那么您可以将二维数组A 转换为一维数组,假设每一行都是一个索引元组,这应该是一个非常有效的解决方案。另外,请注意,这种方法会保持输出中元素的顺序。实施将是 -
# Convert 2D array A to a 1D array assuming each row as an indexing tuple
A_1D = A.dot(np.append(A.max(0)[::-1].cumprod()[::-1][1:],1))
# Get sorting indices for the 1D array
sort_idx = A_1D.argsort()
# Mask of start of each unique row in 1D sorted array
mask = np.append(True,np.diff(A_1D[sort_idx])!=0)
# Get the counts of each unique 1D element
counts = np.bincount(mask.cumsum()-1)
# Select the IDs with counts==1 and thus the unique rows from A
out = A[sort_idx[np.nonzero(mask)[0][counts==1]]]
运行时测试和验证
函数-
def unq_rows_v1(A):
sorted_idx = np.lexsort(A.T)
sorted_Ar = A[sorted_idx,:]
mask = np.append(True,np.any(np.diff(sorted_Ar,axis=0),1))
unq_count = np.bincount(mask.cumsum()-1)
return sorted_Ar[mask][np.nonzero(unq_count==1)[0]]
def unq_rows_v2(A):
A_1D = A.dot(np.append(A.max(0)[::-1].cumprod()[::-1][1:],1))
sort_idx = A_1D.argsort()
mask = np.append(True,np.diff(A_1D[sort_idx])!=0)
return A[sort_idx[np.nonzero(mask)[0][np.bincount(mask.cumsum()-1)==1]]]
计时和验证输出 -
In [272]: A = np.random.randint(20,30,(10000,5))
In [273]: unq_rows_v1(A).shape
Out[273]: (9051, 5)
In [274]: unq_rows_v2(A).shape
Out[274]: (9051, 5)
In [275]: %timeit unq_rows_v1(A)
100 loops, best of 3: 5.07 ms per loop
In [276]: %timeit unq_rows_v2(A)
1000 loops, best of 3: 1.96 ms per loop