【问题标题】:Sorting nonzero elements of a numpy array and getting their indices对 numpy 数组的非零元素进行排序并获取它们的索引
【发布时间】:2017-02-13 19:46:42
【问题描述】:

我有一个 numpy 数组:

x =   numpy.array([0.1, 0, 2, 3, 0, -0.5])

我想得到一个数组 y,其中包含已排序的 x 的非零元素和 idx,它是 x 的对应索引。

例如,对于上面的示例,y 将是 [3, 2, 0.1, -0.5],idx 将是 [3, 2, 0, 5]。我更喜欢一种可以扩展到二维数组而不循环 x 行的方法。

如果我有 2d 示例

x = [[0.1, 0, 2, 3, 0, -0.5],
     [1, 0, 0, 0, 0, 2 ]] 

我想要一个

y =[[3, 2, 0.1, -0.5],[2,1]] and 
idx = [[3, 2, 0, 5], [5, 0]].

【问题讨论】:

  • 发布的解决方案是否适合您?

标签: python sorting numpy


【解决方案1】:

这里有两种矢量化方法可以分别解决1D2D 的情况 -

def sort_nonzeros1D(x):
    sidx = np.argsort(x)
    out_idx = sidx[np.in1d(sidx, np.flatnonzero(x!=0))][::-1]
    out_x = x[out_idx]
    return out_x, out_idx

def sort_nonzeros2D(x):
    x1 = np.where(x==0, np.nan, x)
    sidx = np.argsort(x1,1)[:,::-1]

    n = x.shape[1]
    extent_idx = (x==0).sum(1)
    valid_mask = extent_idx[:,None] <= np.arange(n)
    split_idx = (n-extent_idx[:-1]).cumsum()

    out_idx = np.split(sidx[valid_mask], split_idx)
    y = x[np.arange(x.shape[0])[:,None], sidx]
    out_x = np.split(y[valid_mask], split_idx)
    return out_x, out_idx

样本运行

1D案例:

In [461]: x
Out[461]: array([ 0.1,  0. ,  2. ,  3. ,  0. , -0.5])

In [462]: sort_nonzeros1D(x)
Out[462]: (array([ 3. ,  2. ,  0.1, -0.5]), array([3, 2, 0, 5]))

2D案例:

In [470]: x
Out[470]: 
array([[ 0.1,  0. ,  2. ,  3. ,  0. , -0.5],
       [ 1. ,  0. ,  0. ,  0. ,  0. ,  2. ],
       [ 7. ,  0. ,  2. ,  5. ,  1. ,  0. ]])

In [471]: sort_nonzeros2D(x)
Out[471]: 
([array([ 3. ,  2. ,  0.1, -0.5]),
  array([ 2.,  1.]),
  array([ 7.,  5.,  2.,  1.])],
 [array([3, 2, 0, 5]), array([5, 0]), array([0, 3, 2, 4])])

【讨论】:

    【解决方案2】:

    这是另一个解决方案

    nzidx = np.where(x)
    ranking = np.argsort(x[nzidx]) # append [::-1] for descending order
    result = tuple(np.array(nzidx)[:, ranking])
    

    无论维度如何,x[result]都可以检索到按顺序排列的元素。

    演示:

    >> 
    >>> x
    array([[ 0.        , -1.36688591,  0.12606516, -1.8546047 ,  0.        ,  0.39758545],
           [ 0.65160821, -1.80074214,  0.        ,  0.        ,  1.20758375,  0.33281977]])
    >>> nzidx = np.where(x)
    >>> ranking = np.argsort(x[nzidx])
    >>> result = tuple(np.array(nzidx)[:, ranking])
    >>> 
    >>> result
    (array([0, 1, 0, 0, 1, 0, 1, 1]), array([3, 1, 1, 2, 5, 5, 0, 4]))
    >>> x[result]
    array([-1.8546047 , -1.80074214, -1.36688591,  0.12606516,  0.33281977,
            0.39758545,  0.65160821,  1.20758375])
    

    更新:

    如果排序应该是逐行的,我们可以使用列表推导

    nzidx = [np.where(r)[0] for r in x]
    ranking = [np.argsort(r[idx]) for r, idx in zip(x, nzidx)]
    result = [idx[rk] for idx, rk in zip(nzidx, ranking)]
    

    nzidx = np.where(x)
    blocks = np.searchsorted(nzidx[0], np.arange(1, x.shape[0]))
    ranking = [np.argsort(r) for r in np.split(x[nzidx], blocks)]
    result = [idx[rk] for idx, rk in zip(np.split(nzidx[1], blocks), ranking)]
    

    演示:

    >>> x
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.1218789 ,
             0.        ,  0.        ,  0.        ],
           [ 0.        , -0.6445128 , -0.00603869,  1.47947823, -1.4370367 ,
             0.        ,  1.11606385, -1.22169137],
           [ 0.        ,  0.        ,  0.        ,  1.54048119, -0.85764299,
             0.        ,  0.        ,  0.32325807]])
    >>> nzidx = np.where(x)
    >>> blocks = np.searchsorted(nzidx[0], np.arange(1, x.shape[0]))
    >>> ranking = [np.argsort(r) for r in np.split(x[nzidx], blocks)]
    >>> result = [idx[rk] for idx, rk in zip(np.split(nzidx[1], blocks), ranking)]
    >>> # package them
    ... [(r[idx], idx) for r, idx in zip(x, result)]
    [(array([ 0.1218789]), array([4])), (array([-1.4370367 , -1.22169137, -0.6445128 , -0.00603869,  1.11606385,
            1.47947823]), array([4, 7, 1, 2, 6, 3])), (array([-0.85764299,  0.32325807,  1.54048119]), array([4, 7, 3]))]
    

    【讨论】:

    • 谢谢,我应该给出更好的解释,您的解决方案给出了整个数组的结果,但是我想按行排序。例如,对于上面的示例,我需要每行排序的非零元素列表及其索引。
    【解决方案3】:

    这是一个非 numpy 方法:

    # create (index, value) tuple pairs for each value in `x` if value isn't 0
    idxs_vals = [(idx, val) for idx, val in enumerate(x) if val != 0]
    
    # sort the tuples from above according to the value
    s_idxs_vals = sorted(idxs_vals, key = lambda x: -x[1])           
    
    # grab the value from each tuple
    y = [j for i, j in s_idxs_vals]
    
    # grab the index from each tuple
    idxs = [i for i, j in s_idxs_vals]
    

    【讨论】:

      猜你喜欢
      • 2021-01-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-11-02
      • 1970-01-01
      相关资源
      最近更新 更多