在numpy数组中查找非零之前的零数答案

【问题标题】：Find number of zeros before non-zero in a numpy array在numpy数组中查找非零之前的零数
【发布时间】：2014-06-08 12:21:10
【问题描述】：

我有一个 numpy 数组 A。我想以一种有效的方式返回 A 中非零之前的零数量，因为它处于循环中。

如果A = np.array([0,1,2]) 然后np.nonzero(A)[0][0] 返回1。但是如果A = np.array([0,0,0]) 这不起作用（在这种情况下我想要答案3）。而且，如果 A 非常大并且第一个非零值接近开头，这似乎效率低。

【问题讨论】：

相关问题：stackoverflow.com/questions/7632963/…
相关票证：github.com/numpy/numpy/issues/2269
@shx2 嗯.. 那张票基本上在 2 年前就停止了，然后它指向另一张在 10 个月前消失的票。

标签： python performance numpy

【解决方案1】：

如果你不关心速度，我有一个小技巧来完成这项工作：

a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)

【讨论】：

【解决方案2】：

i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)

这应该是没有扩展的最高效的解决方案。也很容易矢量化以沿多个轴进行操作。

【讨论】：

奇怪的是，这在我的计时中似乎慢了很多。大约 30 毫秒。你得到了什么？
不知道；相比什么？我希望性能类似于非零，除了我们避免构造非零的输出数组。
一个有趣的比较是与代码@Mr-E 测试。

【解决方案3】：

这是一个迭代的 Cython 版本，如果这是一个严重的瓶颈，这可能是你最好的选择

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

这类似于@mtrw 的答案，但以本机速度进行索引。我的 Cython 有点粗略，因此可能需要进一步改进。

用几种不同的方法用 IPython 快速测试一个非常有利的案例

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

但是，如果我有证据（使用分析器）证明这是一个瓶颈，我只会使用这样的东西。许多事情可能看起来效率低下，但永远不值得你花时间去解决。

【讨论】：

+1 用于测试。有趣的是，即使迭代次数很少，Python 循环也比 Cython 慢 10 倍。如果非零元素在大数组的后面，Cython 的优势会更大。

【解决方案4】：

我很惊讶为什么还没有人使用np.where

np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0] 可以解决问题

>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 0

【讨论】：

【解决方案5】：

通过在数组末尾添加一个非零数字，您仍然可以使用 np.nonzero 来获得您想要的结果。

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3

【讨论】：

【解决方案6】：

天真的方法有什么问题：

def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else: #short circuit evaluation, we found a non-zero so return immediately
            return ctr
    return ctr #we get here in the case that x was all zeros

只要找到一个非零元素就会返回，所以在最坏的情况下它是 O(n)。您可以通过将其移植到 C 来使其更快，但值得进行测试，看看这对于您正在使用的数组是否真的有必要。

【讨论】：