如何在 NumPy 数组中获取 N 个最大值的索引？答案

【问题标题】：How do I get indices of N maximum values in a NumPy array?如何在 NumPy 数组中获取 N 个最大值的索引？
【发布时间】：2011-10-18 03:31:17
【问题描述】：

NumPy 提出了一种通过np.argmax 获取数组最大值索引的方法。

我想要类似的东西，但返回 N 最大值的索引。

例如，如果我有一个数组，[1, 3, 2, 4, 5]、function(array, n=3) 将返回与元素 [5, 4, 3] 对应的索引 [4, 3, 1]。

【问题讨论】：

python+numpy: efficient way to take the min/max n values and indices from a matrix的可能重复
你的问题没有很好的定义。例如，array([5, 1, 5, 5, 2, 3, 2, 4, 1, 5]) 和 n= 3 的索引（您期望）是什么？在[0, 2, 3]、[0, 2, 9]、... 等所有备选方案中，哪一个是正确的？请详细说明您的具体要求。谢谢
@eat，我真的不关心在这种特定情况下应该返回哪个。即使返回遇到的第一个看起来合乎逻辑，但这对我来说不是必需的。
如果您不关心返回索引的顺序，argsort 可能是一个可行的选择。请参阅下面的答案。

标签： python numpy max numpy-ndarray

【解决方案1】：

较新的 NumPy 版本（1.8 及更高版本）为此提供了一个名为 argpartition 的函数。要获取四个最大元素的索引，请执行

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])

>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])

>>> top4 = a[ind]
>>> top4
array([4, 9, 6, 9])

与argsort 不同，此函数在最坏的情况下以线性时间运行，但返回的索引未排序，从评估a[ind] 的结果可以看出。如果您也需要，请在之后对它们进行排序：

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

要以这种方式按排序顺序获取前-k 个元素需要 O(n + k log k em>) 时间。

【讨论】：

@varela argpartition 使用 introselect 算法以线性时间 O(n) 运行。随后的排序只处理 k 个元素，因此运行时间为 O(k log k)。
如果有人想知道np.argpartition 及其姊妹算法np.partition 究竟是如何工作的，链接问题中有更详细的解释：stackoverflow.com/questions/10337533/…
@FredFoo：你为什么使用-4？你这样做是为了向后开始吗？（因为 k 是正数或负数对我来说都是一样的！它只会首先打印最小的数字！
@LKT 使用a=np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0]) 因为普通的python列表不支持按列表索引，不像np.array
@Umangsinghal np.argpartition 采用可选的 axis 参数。查找每行前 n 个值的索引：np.argpartition(a, -n, axis=1)[-n:]

【解决方案2】：

我能想到的最简单的是：

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

这涉及到数组的完整排序。我想知道numpy 是否提供了一种内置方法来进行部分排序；到目前为止，我还没有找到一个。

如果这个解决方案太慢（尤其是对于小的n），可能值得考虑在Cython 中编写代码。

【讨论】：

第 3 行可以写成 arr.argsort()[-1:-4:-1] 吗？我已经在解释器中尝试过它并得出了相同的结果，但我想知道它是否没有被某些示例破坏。
@abroekhof 是的，这对于任何列表或数组都应该是等效的。或者，这可以通过使用np.argsort(-arr)[:3] 来完成而不用逆转，我发现它更具可读性和重点。
[::-1] 是什么意思？ @NPE
arr.argsort()[::-1][:n] 更好，因为它为 n=0 返回空而不是完整数组
@NPE numpy 有函数argpartition 将前K个元素与其余元素隔离而不进行完整排序，然后只能对那些K进行排序。

【解决方案3】：

更简单：

idx = (-arr).argsort()[:n]

其中 n 是最大值的数量。

【讨论】：

二维数组可以这样做吗？如果没有，你可能知道怎么做？
@AndrewHundt ：只需使用 (-arr).argsort(axis=-1)[:, :n]
类似的将是arr[arr.argsort()[-n:]]，而不是否定数组，只取最后n个元素的切片
ind = np.argsort(-arr,axis=0)[:4] 帮助我找出前 4 个索引列

【解决方案4】：

用途：

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

对于常规 Python 列表：

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

如果您使用 Python 2，请使用 xrange 而不是 range。

来源：heapq — Heap queue algorithm

【讨论】：

这里根本不需要循环：heapq.nlargest(3, xrange(len(a)), a.take)。对于 Python 列表，我们可以使用 .__getitem__ 而不是 .take。
对于 n 维数组 A 通常：heapq.nlargest(3, range(len(A.ravel())), A.ravel().take)。（我希望这只对视图起作用，另请参阅 (ravel vs flatten](stackoverflow.com/a/28930580/603003)）。

【解决方案5】：

如果您碰巧使用的是多维数组，那么您需要展平和解开索引：

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

例如：

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

【讨论】：

【解决方案6】：

如果您不关心第 K 个最大元素的顺序，您可以使用 argpartition，它的性能应该比通过 argsort 的完整排序更好。

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

学分转到this question。

我进行了一些测试，随着数组大小和 K 值的增加，argpartition 的性能似乎优于 argsort。

【讨论】：

【解决方案7】：

对于多维数组，您可以使用 axis 关键字来沿预期轴应用分区。

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

对于抓取物品：

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

但请注意，这不会返回排序结果。在这种情况下，您可以沿预期轴使用np.argsort()：

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

这是一个例子：

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

【讨论】：

我认为您可以使用np.take_along_axis 简化此处的索引（您回答此问题时可能不存在）
np.argpartition 的默认轴参数为 -1，因此在二维数组情况下无需将其设置为 1。

【解决方案8】：

三个答案比较编码轻松和速度

速度对我的需求很重要，所以我测试了这个问题的三个答案。

这三个答案的代码已根据我的具体情况进行了修改。

然后我比较了每种方法的速度。

编码明智：

NPE 的回答是下一个最优雅且足够快满足我需求的答案。
Fred Foos 的回答需要对我的需求进行最多的重构，但速度最快。我选择了这个答案，因为尽管它需要更多的工作，但它并不算太糟糕并且具有显着的速度优势。
off99555 的回答是最优雅的，但也是最慢的。

完整的测试和比较代码

import numpy as np
import time
import random
import sys
from operator import itemgetter
from heapq import nlargest

''' Fake Data Setup '''
a1 = list(range(1000000))
random.shuffle(a1)
a1 = np.array(a1)

''' ################################################ '''
''' NPE's Answer Modified A Bit For My Case '''
t0 = time.time()
indices = np.flip(np.argsort(a1))[:5]
results = []
for index in indices:
    results.append((index, a1[index]))
t1 = time.time()
print("NPE's Answer:")
print(results)
print(t1 - t0)
print()

''' Fred Foos Answer Modified A Bit For My Case'''
t0 = time.time()
indices = np.argpartition(a1, -6)[-5:]
results = []
for index in indices:
    results.append((a1[index], index))
results.sort(reverse=True)
results = [(b, a) for a, b in results]
t1 = time.time()
print("Fred Foo's Answer:")
print(results)
print(t1 - t0)
print()

''' off99555's Answer - No Modification Needed For My Needs '''
t0 = time.time()
result = nlargest(5, enumerate(a1), itemgetter(1))
t1 = time.time()
print("off99555's Answer:")
print(result)
print(t1 - t0)

输出速度报告

NPE's Answer:
[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.1349949836730957

Fred Foo's Answer:
[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.011161565780639648

off99555's Answer:
[(631934, 999999), (788104, 999998), (413003, 999997), (536514, 999996), (81029, 999995)]
0.439760684967041

【讨论】：

【解决方案9】：

方法np.argpartition只返回k个最大的索引，执行局部排序，当数组很大时比np.argsort（执行全排序）快。但返回的索引不是升序/降序。举个例子吧：

我们可以看到，如果你想要一个严格的升序前 k 个索引，np.argpartition 不会返回你想要的。

除了在 np.argpartition 之后手动进行排序之外，我的解决方案是使用 PyTorch，torch.topk，一种用于构建神经网络的工具，提供类似 NumPy 的 API，同时支持 CPU 和 GPU。它与带有 MKL 的 NumPy 一样快，如果您需要大型矩阵/向量计算，它可以提供 GPU 提升。

严格的上升/下降前 k 个索引代码将是：

请注意，torch.topk 接受一个火炬张量，并返回类型为 torch.Tensor 的前 k 个值和前 k 个索引。与 np 类似，torch.topk 也接受一个轴参数，以便您可以处理多维数组/张量。

【讨论】：

代码 sn-ps 在您共享屏幕截图时被复制。代码块将不胜感激。

【解决方案10】：

这将比完整排序更快，具体取决于原始数组的大小和选择的大小：

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

当然，这涉及篡改您的原始数组。您可以通过复制或替换原始值来修复（如果需要）。 ...以您的用例更便宜的为准。

【讨论】：

FWIW，您的解决方案不会在所有情况下都提供明确的解决方案。 OP 应该描述如何处理这些明确的情况。谢谢
@eat OP的问题有点模棱两可。然而，一个实现并不是真正开放的解释。 :) OP 应该简单地参考 np.argmax docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html 的定义，以确保这个特定的解决方案符合要求。任何满足 OP 声明要求的解决方案都是可以接受的。
好吧，人们可能会认为argmax(.) 的实现也是明确的。（恕我直言，它试图遵循某种短路逻辑，但不幸的是未能提供普遍接受的行为）。谢谢

【解决方案11】：

用途：

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

现在result 列表将包含 N 个元组（index、value），其中value 已最大化。

【讨论】：

【解决方案12】：

用途：

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

它也适用于二维数组。例如，

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

【讨论】：

效果很好，但如果您的数组 A 中有重复（最大）值，则会得到更多结果。我希望得到 k 个结果，但如果出现重复值，您会得到超过 k 个结果。跨度>
我稍微修改了代码。返回的索引列表的长度正好等于 k。如果您有重复项，则将它们分组到一个元组中。

【解决方案13】：

以下是查看最大元素及其位置的一种非常简单的方法。这里axis 是域； axis = 0 表示按列最大数量，axis = 1 表示 2D 情况下按行最大数量。而对于更高的维度，这取决于你。

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

【讨论】：

我用了这个链接jakevdp.github.io/PythonDataScienceHandbook/…

【解决方案14】：

这是一种更复杂的方法，如果第 n 个值有关系，则增加 n：

>>>> def get_top_n_plus_ties(arr,n):
>>>>     sorted_args = np.argsort(-arr)
>>>>     thresh = arr[sorted_args[n]]
>>>>     n_ = np.sum(arr >= thresh)
>>>>     return sorted_args[:n_]
>>>> get_top_n_plus_ties(np.array([2,9,8,3,0,2,8,3,1,9,5]),3)
array([1, 9, 2, 6])

【讨论】：

【解决方案15】：

我发现使用np.unique 最直观。

想法是，唯一方法返回输入值的索引。然后根据最大唯一值和索引，可以重新创建原始值的位置。

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]

【讨论】：

【解决方案16】：

我认为最省时的方法是手动遍历数组并保持 k 大小的最小堆，正如其他人提到的那样。

我还想出了一个蛮力方法：

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

使用 argmax 获取其索引后，将最大元素设置为较大的负值。然后 argmax 的下一次调用将返回第二大元素。如果需要，您可以记录这些元素的原始值并恢复它们。

【讨论】：

TypeError: 'float' 对象不能被解释为整数

【解决方案17】：

此代码适用于 numpy 2D 矩阵 数组：

mat = np.array([[1, 3], [2, 5]]) # numpy matrix
 
n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest 
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat: 
  tf_n_largest = (tf_n_largest) | (mat == x) # true-false  

n_largest_elems = mat[tf_n_largest] # true-false indexing

这会产生一个真假 n_largest 矩阵索引，它也可以从矩阵数组中提取 n_largest 元素

【讨论】：

【解决方案18】：

top_k

import numpy as np

def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
    if reverse:
        axis_length = array.shape[axis]
        partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
                                  range(axis_length - top_k, axis_length), axis)
    else:
        partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
    top_scores = np.take_along_axis(array, partition_index, axis)
    # resort partition
    sorted_index = np.argsort(top_scores, axis=axis)
    if reverse:
        sorted_index = np.flip(sorted_index, axis=axis)
    top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
    top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)
    return top_sorted_scores, top_sorted_indexes

if __name__ == "__main__":
    import time
    from sklearn.metrics.pairwise import cosine_similarity

    x = np.random.rand(10, 128)
    y = np.random.rand(1000000, 128)
    z = cosine_similarity(x, y)
    start_time = time.time()
    sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
    print(time.time() - start_time)

【讨论】：

【解决方案19】：

您可以简单地使用字典来查找 numpy 数组中的前 k 个值和索引。例如，如果您想查找前 2 个最大值和索引

import numpy as np
nums = np.array([0.2, 0.3, 0.25, 0.15, 0.1])


def TopK(x, k):
    a = dict([(i, j) for i, j in enumerate(x)])
    sorted_a = dict(sorted(a.items(), key = lambda kv:kv[1], reverse=True))
    indices = list(sorted_a.keys())[:k]
    values = list(sorted_a.values())[:k]
    return (indices, values)

print(f"Indices: {TopK(nums, k = 2)[0]}")
print(f"Values: {TopK(nums, k = 2)[1]}")


Indices: [1, 2]
Values: [0.3, 0.25]

【讨论】：

【解决方案20】：

使用 argpartition 的矢量化 2D 实现：

k = 3
probas = np.array([
    [.6, .1, .15, .15],
    [.1, .6, .15, .15],
    [.3, .1, .6, 0],
])

k_indices = np.argpartition(-probas, k-1, axis=-1)[:, :k]

# adjust indices to apply in flat array
adjuster = np.arange(probas.shape[0]) * probas.shape[1]
adjuster = np.broadcast_to(adjuster[:, None], k_indices.shape)
k_indices_flat = k_indices + adjuster

k_values = probas.flatten()[k_indices_flat]

# k_indices:
# array([[0, 2, 3],
#        [1, 2, 3],
#        [2, 0, 1]])
# k_values:
# array([[0.6 , 0.15, 0.15],
#        [0.6 , 0.15, 0.15],
#       [0.6 , 0.3 , 0.1 ]])

【讨论】：