【问题标题】：Numpy 2d array, obtain indexes of rows where specified column indexes equal 1Numpy 2d 数组，获取指定列索引等于 1 的行的索引
【发布时间】：2020-06-10 04:44:08
【问题描述】：

我有一个像这样的二维 numpy 数组，它只有 0、1 个值。

a = np.array([[1, 0, 1, 0],  # Indexes 0 and 2 == 1
             [0, 1, 1, 0],   # Indexes 1 and 2 == 1
             [0, 1, 0, 1],   # Indexes 1 and 3 == 1
             [0, 1, 1, 1]])  # Indexes 1, 2, and 3 == 1

我想要做的是获取传递的一对列索引都等于1的每一行的索引。

例如，如果执行此操作的函数是get_rows，get_rows(a, [1, 3])，则应返回 [2, 3]，因为索引 2 和 3 处的行的列索引 1 和 3 等于 1。同样，@987654324 @ 应该返回 [1, 3]。

我知道如何在 Pandas 数据框中执行此操作，但我想坚持使用纯 numpy 进行此操作。我尝试以某种形式使用np.where，例如

np.where( ((a[i1 - 1] == 1) & (a[i2 - 1] == 1) ))

但这似乎并没有给我我想要的东西，并且不适用于不同数量的传递索引。

【问题讨论】：

标签： python arrays numpy

【解决方案1】：

我想你正在寻找这个：

col_idx = [1, 2]
np.where(a[:,col_idx].all(axis=1))[0]

您可以使用任何想要传递给它的列索引。使用 np.where 提取列并搜索其中全为 1 的行是非常不言自明的。

编辑：根据@Mad Physicist 的建议，这是另一个类似的解决方案：

np.flatnonzero(a[:,col_idx].all(axis=1))

输入的输出示例：

[1 3]

【讨论】：

nonzero 在这种情况下优于 where。其实可以用flatnonzero
@MadPhysicist，你能帮我理解为什么flatnonzero 比where 更好吗？它与性能有关吗？
@JulianDrago 至少性能提升。看看这篇文章：stackoverflow.com/questions/47068017/…
@JulianDrago，因为在这种情况下，where 被记录为nonzero 的包装器，如果您只需要一个返回值，flatnonzero 比nonzero 更有效，特别是如果输入是一维的。

【解决方案2】：

解决方案

试试这个。

代码 - 逻辑

查找给定列 (target_col_index) 具有 1 的行和列索引。
仅选择列索引与target_col_index 匹配的行。
通过检查哪些行索引的结果与target_col_index 中的列数匹配，从而缩小结果范围。

import numpy as np

target_col_index = [1,2]
target_row_index = get_row_index(a, target_col_index)
print(target_row_index)

## Output
# [1,3]

## Other cases tested
# test_col_indexes = [ [0,1], [0,2], [0,3], [1,2], [1,3], [2,3], [0,1,3], [1,2,3] ]
# returned_row_indexes = [ [], [0], [], [1,3], [2,3], [3], [], [3] ]

代码 - 自定义函数

def get_row_index(arr, target_col_index=None):
    if target_col_index is None:
        return None
    else:
        row_index, col_index = np.where(arr==1)
        result = row_index[np.isin(col_index, target_col_index)]
        rows, counts = np.unique(result, return_counts=True)
        target_row_index = rows[counts==len(target_col_index)]
        return target_row_index

虚拟数据

a = np.array([[1, 0, 1, 0],  # Indexes 0 and 2 == 1
            [0, 1, 1, 0],   # Indexes 1 and 2 == 1
            [0, 1, 0, 1],   # Indexes 1 and 3 == 1
            [0, 1, 1, 1]])  # Indexes 1, 2, and 3 == 1

【讨论】：

如果传递的参数是 [1, 2]，则返回的答案应该是值 [1, 3]，因为索引 1 和索引 3 处的行的 col 索引等于 1。输出{0, 1, 2, 3} 不正确，是数组中的所有行。
谢谢。我正在更新答案。请立即检查。