NumPy 数组中负数和正数岛的计数答案

【问题标题】：Count of islands of negative and positive numbers in a NumPy arrayNumPy 数组中负数和正数岛的计数
【发布时间】：2017-06-10 13:21:28
【问题描述】：

我有一个包含负元素块和正元素块的数组。一个非常简化的例子是一个数组a，看起来像：array([-3, -2, -1, 1, 2, 3, 4, 5, 6, -5, -4])

(a<0).sum() 和 (a>0).sum() 给了我负面和正面元素的总数，但我如何按顺序计算它们？我的意思是我想知道我的数组包含前 3 个负元素，6 个正元素和 2 个负元素。

这听起来像是某个地方已经讨论过的主题，并且可能存在重复，但我找不到。

一种方法是在整个数组的循环中使用numpy.roll(a,1)，并计算给定符号出现在例如数组滚动时的第一个元素，但它看起来不是很 numpyic（或 pythonic），对我来说也不是很有效。

【问题讨论】：

这是类似的：/questions/42129021/counting-consecutive-1s-in-numpy-array
输出必须是什么，特别是格式？
@Julien 这是一个提示，但不是我真正要问的@Divakar：对于我提供的示例，输出可能是一个数组array([3,6,2])（我可以很容易地知道第一个元素的符号我的输入数组，因此对应于我的输出数组的第一个元素的符号）
由于我们在问题中讨论效率，因此在我的帖子中添加了一个计时部分。

标签： python arrays numpy

【解决方案1】：

这是一种矢量化方法 -

def pos_neg_counts(a):
    mask = a>0
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    count = np.concatenate(( [idx[0]+1], idx[1:] - idx[:-1], [a.size-1-idx[-1]] ))
    if a[0]<0:
        return count[1::2], count[::2] # pos, neg counts
    else:
        return count[::2], count[1::2] # pos, neg counts

示例运行 -

In [155]: a
Out[155]: array([-3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])

In [156]: pos_neg_counts(a)
Out[156]: (array([6]), array([3, 2]))

In [157]: a[0] = 3

In [158]: a
Out[158]: array([ 3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])

In [159]: pos_neg_counts(a)
Out[159]: (array([1, 6]), array([2, 2]))

In [160]: a[-1] = 7

In [161]: a
Out[161]: array([ 3, -2, -1,  1,  2,  3,  4,  5,  6, -5,  7])

In [162]: pos_neg_counts(a)
Out[162]: (array([1, 6, 1]), array([2, 1]))

运行时测试

其他方法-

# @Franz's soln        
def split_app(my_array):
    negative_index = my_array<0
    splits = np.split(negative_index, np.where(np.diff(negative_index))[0]+1)
    len_list = [len(i) for i in splits]
    return len_list

更大数据集的时间 -

In [20]: # Setup input array
    ...: reps = np.random.randint(3,10,(100000))
    ...: signs = np.ones(len(reps),dtype=int)
    ...: signs[::2] = -1
    ...: a = np.repeat(signs, reps)*np.random.randint(1,9,reps.sum())
    ...: 

In [21]: %timeit split_app(a)
10 loops, best of 3: 90.4 ms per loop

In [22]: %timeit pos_neg_counts(a)
100 loops, best of 3: 2.21 ms per loop

【讨论】：

【解决方案2】：

随便用

my_array = np.array([-3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])
negative_index = my_array<0

你会得到负值的 indizes。之后你可以拆分这个数组：

splits = np.split(negative_index, np.where(np.diff(negative_index))[0]+1)

并且计算内部数组的大小：

len_list = [len(i) for i in splits]
print(len_list)

你会得到你想要的：

Out[1]: [3, 6, 2]

您只需要提及您的第一个元素是什么。根据我的代码中的定义，一个否定的。

所以只要执行：

my_array = np.array([-3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])
negative_index = my_array<0
splits = np.split(negative_index, np.where(np.diff(negative_index))[0]+1)
len_list = [len(i) for i in splits]
print(len_list)

【讨论】：

你可以用negative_index = np.signbit(my_array)代替negative_index = my_array<0。应该更快。
我不知道np.signbit()。谢谢，很有帮助。
@Franz 您的解决方案输出了我想要得到的结果，但是 Divakar 在我的 2D 阵列上运行得更快一些，并且具有明显拆分正负元素的优势（这实际上很容易从您的解决方案）。

【解决方案3】：

我（相当简单且可能效率低下）的解决方案是：

import numpy as np
arr = np.array([-3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])
sgn = np.sign(arr[0])
res = []
cntr = 1 # counting the first one
for i in range(1, len(arr)):
 if np.sign(arr[i]) != sgn:
  res.append(cntr)
  cntr = 0
  sgn *= -1
 cntr += 1
res.append(cntr)
print res

【讨论】：